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ETAPS Foreword 


Welcome to the 24th ETAPS! ETAPS 2021 was originally planned to take place in 
Luxembourg in its beautiful capital Luxembourg City. Because of the Covid-19 pan- 
demic, this was changed to an online event. 

ETAPS 2021 was the 24th instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each 
conference has its own Program Committee (PC) and its own Steering Committee 
(SC). The conferences cover various aspects of software systems, ranging from theo- 
retical computer science to foundations of programming languages, analysis tools, and 
formal approaches to software engineering. Organising these conferences in a coherent, 
highly synchronised conference programme enables researchers to participate in an 
exciting event, having the possibility to meet many colleagues working in different 
directions in the field, and to easily attend talks of different conferences. On the 
weekend before the main conference, numerous satellite workshops take place that 
attract many researchers from all over the globe. 

ETAPS 2021 received 260 submissions in total, 115 of which were accepted, 
yielding an overall acceptance rate of 44.2%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2021 featured the unifying invited speakers Scott Smolka (Stony Brook 
University) and Jane Hillston (University of Edinburgh) and the conference-specific 
invited speakers Isil Dillig (University of Texas at Austin) for ESOP and Willem Visser 
(Stellenbosch University) for FASE. Inivited tutorials were provided by Erika Abraham 
(RWTH Aachen University) on analysis of hybrid systems and Madhusudan 
Parthasararathy (University of Illinois at Urbana-Champaign) on combining machine 
learning and formal methods. 

ETAPS 2021 was originally supposed to take place in Luxembourg City, Luxem- 
bourg organized by the SnT - Interdisciplinary Centre for Security, Reliability and 
Trust, University of Luxembourg. University of Luxembourg was founded in 2003. 
The university is one of the best and most international young universities with 6,700 
students from 129 countries and 1,331 academics from all over the globe. The local 
organisation team consisted of Peter Y.A. Ryan (general chair), Peter B. Roenne (or- 
ganisation chair), Joaquin Garcia-Alfaro (workshop chair), Magali Martin (event 
manager), David Mestel (publicity chair), and Alfredo Rial (local proceedings chair). 

ETAPS 2021 was further supported by the following associations and societies: 
ETAPS e.V., EATCS (European Association for Theoretical Computer Science), 
EAPLS (European Association for Programming Languages and Systems), and EASST 
(European Association of Software Science and Technology). 
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The ETAPS Steering Committee consists of an Executive Board, and representa- 
tives of the individual ETAPS conferences, as well as representatives of EATCS, 
EAPLS, and EASST. The Executive Board consists of Holger Hermanns 
(Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofron (Prague), Barbara König 
(Duisburg), Gerald Liittgen (Bamberg), Caterina Urban (INRIA), Tarmo Uustalu 
(Reykjavik and Tallinn), and Lenore Zuck (Chicago). 

Other members of the steering committee are: Patricia Bouyer (Paris), Einar Broch 
Johnsen (Oslo), Dana Fisman (Be’er Sheva), Jan Friso Groote (Eindhoven), Esther 
Guerra (Madrid), Reiko Heckel (Leicester), Joost-Pieter Katoen (Aachen and Twente), 
Stefan Kiefer (Oxford), Fabrice Kordon (Paris), Jan Křetínský (Munich), Kim G. 
Larsen (Aalborg), Tiziana Margaria (Limerick), Andrew M. Pitts (Cambridge), Grigore 
Rosu (Illinois), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Lutz Schröder 
(Erlangen), Ilya Sergey (Singapore), Mariélle Stoelinga (Twente), Gabriele Taentzer 
(Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), 
Anton Wijs (Eindhoven), Manuel Wimmer (Linz), and Nobuko Yoshida (London). 

Id like to take this opportunity to thank all the authors, attendees, organizers of the 
satellite workshops, and Springer-Verlag GmbH for their support. I hope you all 
enjoyed ETAPS 2021. 

Finally, a big thanks to Peter, Peter, Magali and their local organisation team for all 
their enormous efforts to make ETAPS a fantastic online event. I hope there will be a 
next opportunity to host ETAPS in Luxembourg. 


February 2021 Marieke Huisman 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


TACAS 2021 was the 27th edition of the International Conference on Tools and 
Algorithms for the Construction and Analysis of Systems conference series. TACAS 
2021 was part of the 24th European Joint Conferences on Theory and Practice of 
Software (ETAPS 2021), which although originally planned to take place in 
Luxembourg City, was held as an online event on March 27 to April 1 due the the 
COVID-19 pandemic. 

TACAS is a forum for researchers, developers, and users interested in rigorously 
based tools and algorithms for the construction and analysis of systems. The conference 
aims to bridge the gaps between different communities with this common interest and 
to support them in their quest to improve the utility, reliability, flexibility, and effi- 
ciency of tools and algorithms for building computer-controlled systems. There were 
four types of submissions for TACAS: 


— Research papers advancing the theoretical foundations for the construction and 
analysis of systems. 

— Case study papers with an emphasis on a real-world setting. 

— Regular tool papers presenting a new tool, a new tool component, or novel 
extensions to an existing tool and requiring an artifact submission. 

— Tool demonstration papers focusing on the usage aspects of tools, also subject to the 
artifact submission requirement. 


This year 141 papers were submitted to TACAS, consisting of 90 research papers, 
29 regular tool papers, 16 tool demo papers, and 6 case study papers. Authors were 
allowed to submit up to four papers. Each paper was reviewed by three Program 
Committee (PC) members, who made extensive use of subreviewers. 

Similarly to previous years, it was possible to submit an artifact alongside a paper, 
which was mandatory for regular tool and tool demo papers. An artifact might consist 
of a tool, models, proofs, or other data required for validation of the results of the 
paper. The Artifact Evaluation Committee (AEC) was tasked with reviewing the 
artifacts, based on their documentation, ease of use, and, most importantly, whether the 
results presented in the corresponding paper could be accurately reproduced. Most 
of the evaluation was carried out using a standardised virtual machine to ensure con- 
sistency of the results, except for those artifacts that had special hardware requirements. 

The evaluation consisted of two rounds. The first round was carried out in parallel 
with the work of the PC. The judgment of the AEC was communicated to the PC and 
weighed in their discussion. The second round took place after paper acceptance 
notifications were sent out; authors of accepted research papers who did not submit an 
artifact in the first round could submit their artifact here. In total, 72 artifacts were 
submitted (63 in the first round and 9 in the second), of which 57 were accepted and 15 
rejected. This corresponds to an acceptance rate of 79 percent. Papers with an accepted 
artifact include a badge on the first page. 
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Selected authors were requested to provide a rebuttal for both papers and artifacts in 
case a review gave rise to questions. In total 166 rebuttals were provided. Using the 
review reports and rebuttals the Programme and the Artifact Evaluation Committees 
extensively discussed the papers and artifacts and ultimately decided to accept 32 
research papers, 7 tool papers, 6 tool demos, and 2 case studies. 

Besides the regular conference papers, this two-volume proceedings also contains 8 
short papers that describe the participating verification systems and a competition 
report presenting the results of the 10th SV-COMP, the competition on automatic 
software verifiers for C and Java programs. These papers were reviewed by a separate 
program committee (PC); each of the papers was assessed by at least three reviewers. 
A total of 30 verification systems with developers from 11 countries entered the sys- 
tematic comparative evaluation, including four submissions from industry. Two ses- 
sions in the TACAS program were reserved for the presentation of the results: (1) a 
summary by the competition chair and of the participating tools by the developer teams 
in the first session, and (2) an open community meeting in the second session. 
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Abstract We introduce a generalization of the bisimulation game that 
can be employed to find all relevant distinguishing Hennessy—Milner logic 
formulas for two compared finite-state processes. By measuring the use of 
expressive powers, we adapt the formula generation to just yield formulas 
belonging to the coarsest distinguishing behavioral preorders/equivalences 
from the linear-time—branching-time spectrum. The induced algorithm 
can determine the best fit of (in)equivalences for a pair of processes. 


Keywords: Process equivalence spectrum - Distinguishing formulas - 
Bisimulation games. 


1 Introduction 


Have you ever looked at two system models and wondered what would be the finest 
notions of behavioral equivalence to equate them—or, conversely: the coarsest 
ones to distinguish them? We run into this situation often when analyzing models 
and, especially, when devising examples for teaching. We then find ourselves 
fiddling around with whiteboards and various tools, each implementing different 
equivalence checkers. Would it not be nice to decide all equivalences at once? 


Example 1. Consider the following CCS process Pı = a.(b + c) + a.d. It describes 
a machine that can be activated (a) and then either is in a state where one can 
choose from b and c or where it can only be deactivated again (d). Pı shares 
a lot of properties with P2 = a.(b + d) + a.(c + d). For example, they have the 
same traces (and the same completed traces). Thus, they are (completed) trace 
equivalent. 

But they also have differences. For instance, Pı has a run where it executes a 
and then cannot do d, while P2 does not have such a run. Hence, they are not 
failure equivalent. Moreover, Pı may perform a and then choose from b and c, 
and P2 cannot. This renders the two processes also not simulation equivalent. 
Failure equivalence and simulation equivalence are incomparable—that is, neither 
one follows from the other one. Both are coarsest ways of telling the processes 
apart. Other inequivalences, like bisimulation inequivalence, are implied by both. 


In the following, we present a uniform game-based way of finding the most fitting 
notions of (in)equivalence for process pairs like in Ex. 1. 


© The Author(s) 2021 
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Our approach is based on the fact that notions of process equivalence can be 
characterized by two-player games. The defender’s winning region in the game 
corresponds to pairs of equivalent states, and the attacker’s winning strategies 
correspond to distinguishing formulas of Hennessy—Milner logic (HML). 

Each notion of equivalence in van Glabbeek’s famous linear-time—branch- 
ing-time spectrum [10] can be characterized by a subset of HML with specific 
distinguishing power. Some of the notions are incomparable. So, often a process 
pair that is equivalent with respect to one equivalence, is distinguished by a 
set of slightly coarser or incomparable equivalences, without any one of them 
alone being the coarsest way to distinguish the pair. As with the spectrum of 
light where a mix of wave lengths shows to us as a color, there is a “mix” of 
distinguishing capabilities involved in establishing whether a specific equivalence 
is finest. Our algorithm is meant to analyze what is in the mix. 


Contributions. This paper makes the following contributions: 


— We introduce a special bisimulation game that neatly characterizes the 
distinguishing formulas of HML for pairs of states in finite transition systems 
(Subsection 3.1 and 3.2). 

— We show how to enumerate the relevant distinguishing formulas using the 
attacker’s winning region (Subsection 3.3). 

— We give a way of constructing a finite set of distinguishing formulas guaranteed 
to contain observations of the weakest possible observation languages, which 
can be seen as a “spectroscopy” of the differences between two processes 
(Subsection 3.4). 

— We present a small web tool that is able to run the algorithm on finite-state 
processes and output a visual representation of the game (Section 4). We 
also report on the distinctions it finds for all the finitary examples from the 
report version of the linear-time—branching-time spectrum [12]. 


We frame the contributions by a roundtrip through the basics of HML, games and 
the spectrum (Section 2), a discussion of related work (Section 5), and concluding 
remarks on future lines of research (Section 6). 


2 Preliminaries: HML, Games, and the Spectrum 


We use the concepts of transition systems, games, observations, and notions of 
equivalence, largely due to the wake of Hennessy and Milner’s seminal paper [14]. 


2.1 Transition Systems and Hennessy—Milner Logic 


Labeled transition systems capture a discrete world view, where there is a current 
state and a branching structure of possible state changes to future states. 


Definition 1 (Labeled transition system). A labeled transition system is a 
tuple S = (P, X, —) where P is the set of states, X is the set of actions, and 
> CPx »<xP is the transition relation. 
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Hennessy—Milner logic [14] describes finite observations (or “tests”) that one can 
perform on such a system. 


Definition 2 (Hennessy—Milner logic). Given an alphabet X, the syntax of 
Hennessy—Milner logic formulas, HML[S], is inductively defined as follows: 


Observations If € HML[] anda € X, then (a)y € HML[S}. 

Conjunctions If p; € HML[X] for all i from an index set I, then Nicryi € 
HML[S). 

Negations If p € HML[X], then ay € HML[S]. 


We often just write A{Yo, 1, ...} for Aicrpi- T denotes AQ, the nil-element of 
the syntax tree, and (a) is a short-hand for (a)T. Let us also implicitly assume 
that formulas are flattened in the sense that conjunctions do not contain other 
conjunctions as immediate subformulas. We will sometimes talk about the syntax 
tree height of a formula and consider the height of T to equal 0. 

Intuitively, (a)y means that one can observe a system transition labeled by a 
and then continue to make observation(s) y. Conjunction and negation work as 
known from propositional logic. We will provide a common game semantics for 
HML in the following subsection. 


2.2 Games Semantics of HML 


Let us fix some notions for Gale-Stewart-style reachability games where the 
defender wins all infinite plays. 


Definition 3 (Games). A simple reachability game G[go] = (G, Ga, =, go) 
consists of 


— a set of game positions G, partitioned into 
e a set of defender positions Gg C G 
e and attacker positions Ga := G \ Ga, 
— a graph of game moves >= C G x G, and 
— an initial position gg € G. 


Definition 4 (Plays and wins). We call the paths gogi... € G® with gi > gi+1 
plays of Glgo|]. The defender wins infinite plays. If a finite play go... gn is 
stuck, the stuck player loses: The defender wins if gn E€ Ga, and the attacker wins 
if Gn © Ga. 


Definition 5 (Strategies and winning strategies). A (positional, nondeter- 
ministic) strategy is a subset of the moves, F C >. If (fairly) picking elements 
of strategy F ensures a player to win, F is called a winning strategy for this 
player. The player with a winning strategy for G|go] is said to win G{go]. 


Definition 6 (Winning regions). The set Wa C G of all positions g where 
the attacker wins G|g] is called the attacker winning region (defender winning 
region Wq analogous). 
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All Gale-Stewart-style reachability games are determined, that is, Wa U Wa = G. 
The winning regions of finite simple reachability games can be computed in linear 
time of the number of game moves (cf. [13]). This is why the spectroscopy game 
of this paper can easily be used in algorithms. It derives from the following game. 


Definition 7 (HML game). For a transition system S = (P, X, —), the HML 
game GSi [90] = (G, Ga, —, go) is played on G = P x HML[X], where the 
defender controls observations and negated conjunctions, that is (p,(a)y) € Ga 
and (p,-\;<,0i) E€ Ga (for all y,p,I), and the attacker controls the rest. There 
are five kinds of moves: 


- (play) = (py) ifpSp, 

(p,7{a)p) = (P=) ifp5p, 

(p, Nieri) = (pyi) withiel, 

= (p, 7"Aier ¥i) > (p,7yi) withieT, and 
- (p, ==) = (p,p). 


Like in other logical games in the Ehrenfeucht-Fraïssé tradition, the attacker 
plays the conjunctions and universal quantifiers, whereas the defender plays the 
disjunctions and existential quantifiers. For instance, (p,(a)y) is declared as 
defender position, since (a)y is meant to become true precisely if there exists a 
state p' reachable p & p’ where y is true. 

As every move strictly reduces the height of the formula, the game must be 
finite-depth (and cycle-free), and, for image-finite systems and formulas, also 
finite. It is determined and the following semantics is total. 


Definition 8 (HML semantics). For a transition system S = (P, X, —), the 
semantics of HML is given by defining that ọ is true at p in S, written LI, iff 
the defender wins Gay [(p, 9). 


Example 2. Continuing Ex. 1, [(a)3(a) Ts is false: No matter whether the 


defender plays to (b+ d,—7(d)T) or to (c+ d,-7(d)T), the attacker wins by moving 
to the stuck defender position (0,T). (Recall that T is the empty conjunction 
and that 0 is the completed process!) 


2.3 The Spectrum of Behavioral Equivalences 


Definition 9 (Distinguishing formula). A formula ọ distinguishes state p 
from q iff [e], is true and [yp], is not.! 


Example 3. (a)7(d)T distinguishes Pı from P2 in Ex. 1 (but not the other way 
around). (a) \{(b)T, (d)T} distinguishes P from P4. 


Definition 10 (Observational preorders and equivalences). A set of ob- 
servations, Ox C HML[X], preorders two states p,q, written p Ex q, iff no 
formula p € Ox distinguishes p from q. Ifp Ex q and q Ex p, then the two are 
X -equivalent, written p =x q. 


1 In the following, we usually leave the transition system S implicit. 
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Definition 11 (Linear-time—branching-time languages [12]). The linear- 
time-branching-time spectrum is a lattice of observation languages (and of entailed 
process preorders and equivalences). Every observation language Ox can perform 
trace observations, that is, T € Ox and, if p € Ox, then (a)p € Ox. At the 
more linear-time side of the spectrum we have: 


— trace observations Or: Just trace observations, 

— failure observations Or: A\,<;7(ai) € Or, 

— readiness observations Or: N;cryi E€ Or with each p; of form >(a;) or (aj), 

— failure trace observations Orr: Nicrpi E€ Orr with po E€ Orr and, fori > 0, 
pi = (ai), 

— ready trace observations Orr: Nierpi € Orr with po E€ Orr and, for i > 0, 
yi of form ~(a;) or (ai), 

— impossible futures Oyp: Niemy: € Orr with all pi € Or, and 

— possible futures Opr: Nierpi € Opr with all pi € {>i, Yi} and Ypi € Or? 


At the more branching-time side, we have simulation observations. Every simula- 
tion observation language Oxs, has full conjunctive capacity, that is, if pi E€ Oxs 
for alli € I, then Nicrpi € Oxs- 


— simulation observations O;,3: Just simulation (and trace) observations, 

— n-nested simulation observations Ons: =y € Ong with p E€ O(n—1)8; 

— ready simulation observations Orgs: =la) € Orgs, and 

— bisimulation observations Og: The same as Oss, which is exactly HML[Z]. 


The observation languages of the spectrum differ in how many of the syntactic 
features of HML one will encounter when descending into a formula’s syntax tree. 
We will come back to this in Subsection 3.4. 

Note that we consider A{ọ} to be an alias for p. With this aliasing, all the 
listed observation languages are closed in the sense that all subformulas of an 
observation are themselves part of that language. They thus are inductive in the 
sense that all observations must be built from observations of the same language 
with lower syntax tree height. 


3 Distinguishing Formula Games 


This section introduces our main contribution: the spectroscopy game (Def. 13), 
and how to build all interesting distinguishing HML formulas from its winning 
region (Def. 14). To justify our construction and to prove that we indeed find 
distinguishing formulas (Thm. 1), let us first examine the formula preorder game 
(Def. 12), which is closer to the problem whether formulas are (non-)distinguishing. 


? Like Kučera and Esparza [17], who studied the properties of “good” observation 
languages, we glimpse over completed trace, completed simulation and possible worlds 
observations here, because these observations need a special exhaustive A, ex’: While 
it could be provided for with additional operators, it would add another case in each 
of the upcoming definitions and would break the closure property of observation 
languages, without giving much in return. 
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3.1 The Formula Preorder Game 


Def. 10 entails a straightforward way of turning the problem whether a set of 
observations O C Ox preorders two states p,q into a game: Have the attacker 
pick a supposedly distinguishing formula y € O, and then have the defender 
choose whether to play the HML game (Def. 7) for [-y]],, or for [y]],. This direct 
route will yield infinite games for infinite O—and all the languages from Def. 11 
are infinite! 

To bypass the infinity issue, we will introduce a variation of this game where 
the attacker gradually chooses their attacking formula. In particular, this means 
that the attacker now decides which observations to play. In return, the defender 
does not need to pick a side in the beginning and may postpone the decision where 
(on the right-hand side) an observation leads. Postponing decisions here means 
that the defender may play non-deterministically, moving to multiple states at 
once. The mechanics are analogous to the standard powerset construction when 
transforming non-deterministic finite automata into deterministic ones. 


Definition 12 (Formula preorder game). For a transition system S = 
(P, X,—) and a set of observations Ox, the formula preorder game G$ [go] = 
(G, Ga,™, go) consists of 
— attacker positions (p, Q, O), € Ga with p E€ P, Q€ 2”, and O C Ox, 
— defender conjunction positions (p,Q, 0) € Ga where the defender has to 
answer to conjunction challenges, and 
— defender negation positions (p, Q, O)4 € Ga where the defender has to answer 
to negation challenges, 


and five kinds of moves 


— observation moves (p,Q,O), = (p’,Q’,O’), 

ifp pl with Q' = {q' | Iq € Q.q5q'} and O' = {y| (a)y € O}, 
— conjunct challenges (p,Q,0), = (p,Q,{yi |i € Ipi 

if Nieri E€ O: 
— conjunct answers (p,Q, O) = (p,{q},©), 


ifqEQ, 

— negation challenges (p,Q,O), = (P,Q, {pa 
if ap € O, and 

— negation answers (p,Q,O) = (q, {p}, 0), 
ifqEQ. 


The formula preorder game precisely characterizes whether an observation lan- 
guage is distinguishing: 

Lemma 1. For a closed observation language Ox, the formula preorder game 
G$ [(p,Q,0).] with O C Ox is won by the defender precisely if, for every 
observation p € O with [p], there is aq E Q such that [o], 


Proof (Sketch). By induction over the height of formulas in Ox with arbitrary 
p and Q, and strengthening the induction predicate to not only consider y but 
also partial conjunctions \O” with O” C O’ whenever p = AO’. To prove the 
right-to-left direction, exploiting the determinacy of the game is convenient. 
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(a {he 


Figure 1. Schematic spectroscopy game Ga of Def. 13. Boxes stand for attacker 
positions, circles for defender positions, arrows for moves. From the dashed boxes, the 
moves are analogous to the ones of the connected solid positions. 


3.2 The Spectroscopy Game 


Let us now remove the formulas from the formula game (Def. 12). The idea is 
to look at the game for the whole of HML, called Gg. Only attack moves in the 
formula game change the current set of observations, and they are completely 
guided by the context-free grammar of HML (Def. 2). Therefore, we can? assume 
O to equal HML[X] in every reachable position of Gg. Effectively, O can be 
canceled out of the game, without losing any information. We call the remaining 
game the “spectroscopy game.” Figure 1 gives a graphical representation. 


Definition 13 (Spectroscopy game). For a transition system S = (P, X, >), 
the L-labeled spectroscopy game G3 [go] = (G, Ga, —, go) with L = {7,A,*, (a)} 


consists of 


— attacker positions (p, Q), € Ga with p€ P, Q € 2”, 
— defender positions (p,Q), € Ga where the defender has to answer to conjunc- 
tion challenges, 


and four kinds of moves: 


(a) = a ` a 
— observation moves (p,Q), = (p',{¢ |4gEQ.¢57}), ifp>p 
— conjunct challenges (p, Q), 2 (p,Q)a; 


— conjunct answers (p, Q)a = (p, {4})a ifq EQ, and 
negation moves (p, {q}, — (4 {ppa 


We have already introduced two tricks in this definition to ease formula recon- 
struction in the next subsection. (1) The attack moves are labeled with the 


3 To be precise: Finite conjunctions may only lead to arbitrarily large subsets of 
HML[X]. If the attacker has a way of winning by playing a conjunction, we can as 
well approximate this move as playing AHML. 
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syntactic constructs from which they originate. This does not change expressive 
power. (2) Negation moves are restricted to situations where Q = {q}. After all, 
winning attacker strategies will pay attention to only playing a negation after 
minimizing the odds of being put on a bad position, anyways. 

Note that, like in the formula game with arbitrary-depth formulas, the attacker 
could force infinite plays by cycling through conjunction moves (and also negation 
moves). However, they will not do this, as infinite plays are won by the defender. 


Lemma 2. The spectroscopy game Ga |(p, {q}),] is won by the defender precisely 
if p and q are bisimilar. 


This fact is a corollary of the well-known Hennessy—Milner theorem (HML char- 
acterizes bisimilarity), given that Ga is constructed as a simplification of Gp. 
Comparing Ga to the standard bisimulation game from the literature (with 
symmetry moves, see e.g. [3]), we can easily transfer attacker strategies from there. 
In the standard game, the attacker will play (p,q) — (a, p',q) with p $ p' and 
the defender has to answer by (a, p’,q) > (p', q') with q5 q’. In ia a 


game, the attacker can enforce analogous moves by playing (p, { as Se Q") > 


(p', Q'),, which will make the defender pick (p’, Q’), (P, {d Ha 

The opposite direction of transfer is not so easy, as the attacker has more 
ways of winning in Ga. But this asymmetry is precisely why we have to use the 
spectroscopy game instead of the standard bisimulation game if we want to learn 
about, for example, interesting failure-trace attacks. 

Due to the subset construction over P, the game size clearly is exponential in 
the size of the state space. Going exponential is necessary, as we want to also 
characterize weaker preorders like the trace preorder, where exponential P-subset 
or X*-word constructions cannot be circumvented. However, for moderate real- 
world systems, such constructions will not necessarily show their full exponential 
blow-up (cf. [6]). 

For concrete implementations, the subset construction also means that the 
costs of storing game nodes and of comparing two nodes is linear in the state space 
size. Complexity-wise this factor is dominated by the overall exponentialities. 


3.3 Building Distinguishing Formulas from Attacker Strategies 


Definition 14 (Strategy formulas). Given an attacker strategy F C (Ga x 
L x G) for the spectroscopy game Ga, the set of strategy formulas, Stratp (ga), 
is inductively defined by: 


— If p € Stratr(gi,) and (ga, (b), g4) E€ F, then (b)p € Stratr(ga), 
— if p € Stratp (g4) and (ga, =, g4) € F, then ~y E Stratr(ga), and 


— if pg, E€ Stratr (g4) for all g, E€ I = {94 | gaa ga}, and (ga, ^, ga) € F, 
then NorerPo, E Stratp (ga). 


Example 4. The attacks (Pi, {Po}),° STEA {b+d,c+d}), avai (0,0), 
give rise to the formula (a) /\{7(d)T}, which can be written as (a)—(d). 
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Definition 15 (Winning strategy graph). Given the attacker winning region 
W, and a starting position go E€ Wa, the attacker winning strategy graph Fa is 
the subset of the —-graph that can be visited from go when following all —-edges 
unless they lead out of Wa. 


This graph can be cyclic. However, if the attacker plays inside their winning region 
according to Fa, they will always have paths to their final winning positions. So 
even though the attacker could loop (and thus lose), they can always end the 
game and win in the sense of Def. 5. 


Theorem 1. If Wa is the attacker winning region of the spectroscopy game Ga, 
every y E€ Stratr, ((p, {q}),) distinguishes p from q. 


Proof. Due to Lem. 1, it suffices to show that y € Stratp, ((p,Q),) implies that 
the attacker wins Gg[(p, Q, {y})]. We proceed by induction on the structure of 
Straty, with arbitrary p, Q. 


— Assume  € Stratr, ((p', Q')a) and ((p, Q),, (b), (p’, Q')a) € Fa. By induction 
hypothesis, the attacker wins Gp[(p’, Q’, {y})]. By moving there, the attacker 
also wins Gg[|(p, Q, {(6)y~)}], which must be a valid move as F, is a strategy 
for Ga. 

— Assume ¢ € Stratr, ((p’, Q’),) and ((p, Q),, 7, (P, Q’),) € Fa. By induction 
hypothesis, the attacker wins Gp[(p’, Q’, {y})]. By the construction of Ga, 
Q = {p'}. So the attacker can win Gp[(p, Q, {y})] by moving to this position 
(with the defender having no choice when picking from Q). 


— Assume pg, € Stratr,(g,) for all g, = (p',{g'})a € I = {9 | gaa 
ga}, and ((p,Q),,A,9a) € Fa. Due to the construction of Ga, Q = {q | 
(p’, {q })a E I} and p' = p. By induction hypothesis, the attacker wins all 
Gal(p’, {q}, {91 })] and, as they can always focus on consuming just one 
formula, also all Gal(p, {q'}, {gv | g4 € I})]. This matches all the positions 
the defender can move to after (p, Q, {997 | ga € I}),- Moving there, the 


attacker wins Gpl(p, Q, {Ag e129, ))]- 


Note that the theorem is only one-way, as every distinguishing formula can 
neutrally be extended by saying that some additional clause that is true for both 
processes does hold. Def. 14 will not find such bloated formulas. 

Due to cycles in the game graph, Strat, will usually yield infinitely many 
formulas. But we can become finite by injecting some way of discarding long 
formulas that unfold negation cycles or recursions of the underlying transition 
system. The next section will discuss how to do this without discarding the 
formulas that are interesting from the point of view of the spectrum. 


3.4 Retrieving Cheapest Distinguishing Formulas 


In our quest for the coarsest behavioral preorders (or equivalences) distinguishing 
two states, we actually are only interested in the ones that are part of the smallest 
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observation languages from the spectrum (Def. 11). We can think of the amount 
of HML-expressiveness used by a formula as its price. 

Let us look at the price structure of the spectrum from Def. 11. Table 1 
gives an overview of how many syntactic HML-features the observation languages 
may use at most. (If formulas use fewer, they still are considered part of that 
observation language.) So, we are talking budgets, in the price analogy. 


Conjunctions: How often may one run into a conjunction when descending 
down the syntax tree. Negations in the beginning or following an observation 
are counted as implicit conjunctions. 

Positive deep branches: How many positive deep branches may appear in 
each conjunction? We call subformulas of the form (a) or (a) flat branches, 
and the others deep branches. 

Positive flat branches: How many positive flat branches may appear in each 
conjunction?* 

Negations: How many negations may be visited when descending? 

Negations height: How high can the syntax trees under each negation be? 


We say that a formula yı dominates p2 if pı has lower or equal values than ye 
in each dimension of the metrics with at least one entry strictly lower. Let us 
note the following facts: 


4 There is a special case for failure-traces where 1 positive flat branch may be counted 
as deep, if there are no other deep branches. Hence the * in Table 1. 


Table 1. Dimensions of observation expressiveness. 


y x 
O 
$% & Py 2 
Ka) È > Sy 
Key Y Y $ y 
g L & +O o 
N $ x ïy S 
S 8 $ & ge 
Observations © Q Q > = 
trace Or 0 0 0 0 
failure O p 1 (0) 0 1 1 
readiness Or 1 0 love) 1 1 
failure-trace Opr oo 1 0* 1 1 
ready-trace Orr love) 1 love) 1 1 
impossible-future Oj 1 0 0 1 lee) 
possible-future O pp 1 love) love) 1 love) 
ready-simulation Orgs o0 o0 o0 1 1 
(n+1)-nested-simulation O(n41)3 © oe) o0 n oe) 
bisimulation Og lee) o0 o0 o0 o0 
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1 def game_spectroscopy(S, po, qo): 
2 G = (G, Ga, >) := construct_spectroscopy_game(S) 
3 Wa := compute _winning_ region(G3 ) 
a | if (po, {q0}), € Wa? 
5 Fa := winning graph(G2, Wa, (po, {q0})a) 
6 strats[] := Ø 
7 todo = [(po, {ao})el 
8 while todo Æ |]: 
9 g := todo.dequeue() 
10 sg := strats/g] 
11 if sg = undefined : 
12 strats[sg] := Ø 
13 ge’ := {g9 | (g,-,9') € Fa Astrats(g’) = undefined} 
14 if ge’ =): 
15 sg’ = nonDominatedOrlF(Strat’p, strats(g)) 
16 if sg Asg’: 
17 strats(g) := sg’ 
18 todo.enqueueEachEnd({g* | (g*,-,g) € Fa A g* ¢ todo}) 
19 else: 
20 todo.enqueueEachFront(gg’ ) 
21 return strats((po, {qo}),) 
22 else: 
23 R := {(p,4) | {a}, € Ga \ Wa} 
24 return R 


Algorithm 1: Spectroscopy procedure. 


1. When formulas are constructed recursively, like the strategy formulas in 
Def. 14, they can only contribute to dominating (i.e. more expensive) or 
equivalently valued formulas with respect to the metrics. 

2. Formulas can be incomparable. For example, (a) /\{(b),(c)} and (a)7(a), 
corresponding to coordinates (1,0,2,0,0) and (1,0,0,1,1), are incomparable. 

3. A locally more expensive formula may pay off as part of a bigger global 
formula. For example, if two states are distinguished by —(a) and (b), the 
dominated formula —=(a) may later be handy to construct a (comparably 
cheap) failure formula. 


These observations justify our algorithm to prune all formulas from the set 
Stratr, (g) that are dominated with respect to the metrics by any other formula 
in this set, unless they are impossible trace futures of the form (a1) (a2).... We 
moreover add formula height in terms of observations as a dimension in the 
metric, which leads to loop unfoldings being dominated by the shorter paths. 
Algorithm 1 shows all the elements in concert. It constructs the spectroscopy 
game G& (Def. 13) and computes its attacker winning strategy graph F, (Def. 15). 
If the attacker cannot win, the algorithm returns a bisimulation relation. Other- 
wise, it constructs the distinguishing formulas: It keeps a map strats of strategy 
formulas that have been found so far and a list of game positions todo that have 
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Load» Export» Background 


P1 distinguished fiom P2 under readiness,simulation preorder by (a)A{(c)T,(b) T} 
P1 distinguished fiom P2 under failure preorder by (a)-{d)T 


x=300, y=600) 
b.0 + d.0"(x=500, y=400) 
"c.0 + d.0"(x=300, y=300) 
"d.0"(x=100, y=400) 
"b.O + c.0"(x=150, y=150) 


9 
10 Pl = (a.(b.0 + c.0) + a.d.6) 
1 P2 = (a.(b.0 + d.0) + a.(c.0 + d.0)) 
12 
13 % @compare "P1, P2" 
+: Pl distinguished from P2 under readiness,simulation 
preorder by (a)A{(c)T, (b)T} 
P1 distinguished from P2 under failure preorder by (a)-(d)T 
* 1: P2 distinguished from P1 under readiness, failure-trace 
preorder by (a)A{(b)T,-(c)T} 
P2 distinguished from P1 under readiness,simulation preorder 
by (ayA{(d)T, (c)T} 
P2 distinguished from P1 under readiness,simulation preorder 
by (a)A{(b)T, (d)T} 
P2 distinguished from P1 under readiness, failure-trace 
preorder by (a)A{7(b}T, (c)T} 
+ 2: Pl preordered to P2 by traces 
+ 3: P2 preordered to P1 by impossible-future 


Figure 2. Screenshot of a linear-time—branching-time spectroscopy of the processes 
from Ex. 1. 


to be updated. In every round, we take a game position g from todo. If some 
of its successors have not been visited yet, we add them to the top of the work 
list. Otherwise we call Strat’, strats(g) to compute distinguishing formulas using 
the follow-up formulas found so far strats. This function mostly corresponds 
to Def. 14 with the twist, that partial follow-ups are used instead of recursion, 
and that the construction for conjunctions is split onto attacker and defender 
positions. Of the found formulas, we keep only the non-dominated ones and 
impossible future traces. If the result changes strats(g), we enqueue each game 
predecessor to propagate the update there. 

The algorithm structure is mostly usual fixed point machinery. It terminates 
because, for each state in a finite transition system, there must be a bound on the 
distinguishing mechanisms necessary with respect to our metrics, and Strat’ will 
only generate finitely many formulas under this bound. Keeping the impossible 
future formulas unbounded is alright, because they have to be constructed from 
trace formulas, which are subject to the bound. 


4 A Webtool for Equivalence Spectroscopy 


We have implemented the game and the generation of minimal distinguishing for- 
mulas in the “Linear-time—Branching-time Spectroscope”, a Scala.js program that 
can be run in the browser on https://concurrency-theory.org/ltbt-spectroscope/. 

The tool (screenshot in Fig. 2) consists of a text editor to input basic CCS-style 
processes and a view of the transition system graph. When queried to compare 
two processes, the tool yields the cheapest distinguishing HML-formulas it can 
find for both directions. Moreover, it displays the attacker-winning part of the 


A Game for Linear-time—Branching-time Spectroscopy 15 


spectroscopy game overlayed over the transition system. The latter can also 
enlighten matters, at least for small and comparably deterministic transition 
systems. From the found formulas, the tool can also infer the finest fitting 
preorders for pairs of processes (Fig. 3). 

To “benchmark” the quality of the distinguishing formulas, we have run the 
algorithm on all the finitary counterexample processes from the report version 
of “The Linear-time—Branching-time Spectrum” [12]. Table 2 reports the output 
of our tool, on how to distinguish certain processes. The results match the 
(in)equivalences given in [12]. In some cases, the tool finds slightly better ways of 
distinction using impossible futures equivalence, which was not known at the time 
of the original paper. All the computed formulas are quite elegant / minimal. 

For each of the examples (from papers) we have considered, the browser’s 
capacities sufficed to run the algorithm in 30 to 250 milliseconds. This does not 
mean that one should expect the algorithm to work for systems with thousands 
of states. There, the exponentialities of game and formula construction would 
hit. However, such big instances would usually stem from preexisting models 
where one would very much hope for the designers to already know under which 
semantics to interpret their model. The practical applications of our browser tool 
are more on the research side: When devising compiler optimizations, encodings, 
or distributed algorithms, it can be very handy to fully grasp the equivalence 
structure of isolated instances. The Linear-time—Branching-time Spectroscope 
supports this process. 


Table 2. Formulas found by our implementation for some interesting processes from [12]. 


p q Cheapest distinguishing formulas found |From 
P1 P2 (a) N (c), (b) } € ORN Os, Ex. 1 
(a)7(d) € Or 
a.b+a a.b (a)7(b) € Or p. 13 
a.b+a.(b+c) a.(b +c) (a)7(c) € Or p. 16 
a.(b + c.d) + a.(b + c.e) + (a) \{(c)(d), (b}} € Orr N Opr N Os, p. 21 
a.(f + c.e) a.(f + c.d) (a Nc) d), =} E Orr N Orr, 
(a) \{>(b), =(c)(d)} € Orr (+3 variants) 
a.b+a.(b+c)+a.c|a.b + a.c (a) \{(c), (b) } € ORN Os p. 24 
a.(b+a.(b+c.d)Ha.(a.(b+c.d)+ |(a)A{(b), (a) A{(c) (d), (b)}} € Orr NOs, |p. 27 
a.c.e) + a.(a.c.d +a.c.e) + a.(a.c.d + (a) \{7(b), (a) A{(c) (d), =b) }} € Orr 
a.(c.e + b)) a.(c.e + b) + b) 
a.(b.c + b.d) a.b.c + a.b.d (a) Nb) (c), (b}(d}} € Opr N Os p. 31 
a.b.c+a.(b.c+b.d)|a.(b.c + b.d) (a)7(b) (d) € Orr p. 34 
a.b+a+a.c a.b+a.(b+c)+a.c| la) \{7(b), a(c)} € Or p. 38 
a.b.c + a.(b.c + b) |a.(b.c + b) (a)7(b)7(c) € OB p. 42 
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Figure 3. Tool output of finest preorders for transition systems. (Left: Ex. 1; right: 
a.b+a.(b+c)+a.c vs. a.b +a + a.c. 


5 Related Work and Alternatives 


The game and the algorithm presented fill a blank spot in between the following 
previous directions of work: 


Distinguishing formulas in general. Cleaveland |5] showed how to restore 
(non-minimal) distinguishing formulas for bisimulation equivalence from the 
execution of a bisimilarity checker based on the splitting of blocks. There, it has 
been named as possible future work to extend the construction to other notions of 
the spectrum. We are not aware of any place where this has previously been done 
completely. But there are related islands like the encoding between CTL and 
failure traces by Bruda and Zhang [7]. There is also more recent work like Jasper 
et. al [15] extending to the generation of characteristic invariant formulas for 
bisimulation classes. Previous algorithms for bisimulation in-equivalence tend to 
generate formulas that alternate (a) and [b] observations while pushing negation 
to the innermost level. Such formulas can not as easily be linked to the spectrum 
as ours. 


Game-characterizations of the spectrum. After Shukla et al. [18] had shown 
how to characterize many notions of equivalence by HORNSAT games, Chen and 
Deng [4] presented a hierarchy of games characterizing all the equivalences of the 
linear-time—branching-time spectrum. The games from [4] cannot be applied as 
easily as ours in algorithms because they allow word moves and thus are infinite 
already for finite transition systems with cycles. Constructing distinguishing 
formulas from attacker strategies of these games would be less convenient than 
in our solution. Their parametric approach is comparable to fixing maximal price 
budgets ex ante. Our on-the-fly picking of minimal prices is more flexible. 
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Using game-characterizations for distinguishing formulas. There is 
recent work by Mika-Michalski et al. [16] on constructing distinguishing formulas 
using games in a more abstract coalgebraic setting focussed on the absence of 
bisimulation. The game and formula generation there, however, cannot easily be 
adapted for our purpose of performing a spectroscopy also for weaker notions. 


Alternatives. One can also find the finest notion of equivalence between two 
states by gradually minimizing the transition system with ever coarser equiv- 
alences from bisimulation to trace equivalence until the states are conflated 
(possibly also trying branches). Within a big tool suite of highly optimized algo- 
rithms this should be quite efficient. We preferred the game approach, because it 
can uniformly be extended to the whole spectrum and also has the big upside of 
explaining the in-equivalences by distinguishing formulas. 

An avenue of optimization for our approach, we have already tried, is to run 
the formula search on a directed acyclic subgraph of the winning strategy graph. 
For our purpose of finding most fitting equivalences, DAG-ification may preclude 
the algorithm from finding the right formulas. On the other hand, if one is mainly 
interested in a short distinguishing formula for instance, one can speed up the 
process with DAG-ification by the order of remaining game rounds. 


6 Conclusion 


In this paper, we have established a convenient way of finding distinguishing 
formulas that use a minimal amount of expressiveness. 

System analysis tools can employ the algorithm to tell their users in more 
detail how equivalent two process models are. While the generic approach is 
costly, instantiations to more specific, symbolic, compositional, on-the-fly or 
depth-bounded settings may enable wider applications. There are also some 
algorithmic tricks (like building the concrete formulas only after having found the 
price bounds and heuristics in handling the game graph) we have not explored in 
this paper. 

So far, we have only looked at strong notions of equivalence [10]. We plan to 
verify the game in Isabelle/HOL and to extend our algorithm, so it also deals 
with weak notions of equivalence [11]. These equivalences abstract over T-actions 
representing “internal activity” and correspond to observation languages with a 
special temporal (¢)-observation (cf. [9]). This would generalize work on weak 
game characterizations such as de Frutos-Escrig et al.’s [8] and our own [2,3]. The 
vision is to arrive at one certifying algorithm that can yield finest equivalences 
and cheapest distinguishing formulas as witnesses for the whole discrete spectrum. 

On a different note, our group is also working on an educational computer 
game about process equivalences.” The (theoretical) game of this paper can likely 


5 A prototype featuring equivalences between strong bisimulation and coupled sim- 
ulation (result of Dominik Peacock’s bachelor thesis) can be played on https: 
//www.concurrency-theory.org/rvg-game/. 
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be adapted to go in the other direction: from formulas to distinguished transition 
systems. It may thereby synthesize levels for the (computer) game. So, in the 
end, all this might actually contribute to actual people having actual fun. 
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Abstract. Several problems in planning and reactive synthesis can be 
reduced to the analysis of two-player quantitative graph games. Opti- 
mization is one form of analysis. We argue that in many cases it may be 
better to replace the optimization problem with the satisficing problem, 
where instead of searching for optimal solutions, the goal is to search for 
solutions that adhere to a given threshold bound. 

This work defines and investigates the satisficing problem on a two-player 
graph game with the discounted-sum cost model. We show that while the 
satisficing problem can be solved using numerical methods just like the 
optimization problem, this approach does not render compelling bene- 
fits over optimization. When the discount factor is, however, an integer, 
we present another approach to satisficing, which is purely based on au- 
tomata methods. We show that this approach is algorithmically more 
performant — both theoretically and empirically — and demonstrates the 
broader applicability of satisficing over optimization. 


1 Introduction 


Quantitative properties of systems are increasingly being explored in automated 
reasoning [4J]14]16]20/21)26]. In decision-making domains such as planning and 
reactive synthesis, quantitative properties have been deployed to describe soft 
constraints such as quality measures [IT], cost and resources [I8]22], rewards [B1], 
and the like. Since these constraints are soft, it suffices to generate solutions that 
are good enough w.r.t. the quantitative property. 

Existing approaches on the analysis of quantitative properties have, however, 
primarily focused on optimization of these constraints, i.e., to generate optimal 
solutions. We argue that there may be disadvantages to searching for optimal 
solutions, where good enough ones may suffice. First, optimization may be more 
expensive than searching for good-enough solutions. Second, optimization re- 
stricts the search-space of possible solutions, and thus could limit the broader 
applicability of the resulting solutions. For instance, to generate solutions that 
operate within battery life, it is too restrictive to search for solutions with mini- 
mal battery consumption. Besides, solutions with minimal battery consumption 
may be limited in their applicability, since they may not satisfy other goals, such 
as desirable temporal tasks. 

To this end, this work focuses on directly searching for good-enough solu- 
tions. We propose an alternate form of analysis of quantitative properties in 
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which the objective is to search for a solution that adheres to a given thresh- 
old bound, possibly derived from a physical constraint such as battery life. We 
call this the satisficing problem, a term popularized by H.A.Simon in economics 
to mean satisfy and suffice, implying a search for good-enough solutions [I]. 
Through theoretical and empirical investigation, we make the case that satis- 
ficing is algorithmically more performant than optimization and, further, that 
satisficing solutions may have broader applicability than optimal solutions. 


This work formulates and investigates the satisficing problem on two-player, 
finite-state games with the discounted-sum (DS) cost model, which is a standard 
cost-model in decision-making domains [2425B8]. In these games, players take 
turns to pass a token along the transition relation between the states. As the 
token is pushed around, the play accumulates costs along the transitions using 
the DS cost model. The players are assumed to have opposing objectives: one 
player maximizes the cost, while the other player minimizes it. We define the 
satisficing problem as follows: Given a threshold value v € Q, does there exist a 
strategy for the minimizing (or maximizing) player that ensures the cost of all 
resulting plays is strictly or non-strictly lower (or greater) than the threshold v? 


Clearly, the satisficing problem is decidable since the optimization prob- 
lem on these quantitative games is known to be solvable in pseudo-polynomial 
time [17/23)32). To design an algorithm for satisficing, we first adapt the cele- 
brated value-iteration (VI) based algorithm for optimization [82] (B). We show, 
however, that this algorithm, called VISatisfice, displays the same complexity as 
optimization and hence renders no complexity-theoretic advantage. To obtain 
worst-case complexity, we perform a thorough worst-case analysis of VI for op- 
timization. It is interesting that a thorough analysis of VI for optimization had 
hitherto been absent from the literature, despite the popularity of VI. To ad- 
dress this gap, we first prove that VI should be executed for O(|V]|) iterations 
to compute the optimal value, where V and E refer to the sets of states and 
transitions in the quantitative game. Next, to compute the overall complexity, 
we take into account the cost of arithmetic operations as well, since they appear 
in abundance in VI. We demonstrate an orders-of-magnitude difference between 
the complexity of VI under different cost-models of arithmetic. For instance, 
for integer discount factors, we show that VI is O(|V|-|E]|) and O(|V|? - |E|) 
under the unit-cost and bit-cost models of arithmetic, respectively. Clearly, this 
shows that VI for optimization, and hence VISatisfice, does not scale to large 
quantitative games. 

We then present a purely automata-based approach for satisficing (sH. While 
this approach applies to integer discount factors only, it solves satisficing in 
O(|V| + |E]) time. This shows that there is a fundamental separation in com- 
plexity between satisficing and VI-based optimization, as even the lower bound 
on the number of iterations in VI is higher. In this approach, the satisficing prob- 
lem is reduced to solving a safety or reachability game. Our core observation is 
that the criteria to fulfil satisficing with respect to threshold value v € Q can be 
expressed as membership in an automaton that accepts a weight sequence A iff 
DS(A,d) R v holds, where d > 1 is the discount factor and R € {<,>,<,>}. In 
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existing literature, such automata are called comparator automata (comparators, 
in short) when the threshold value v = 0 [6[7]. They are known to have a com- 
pact safety or co-safety automaton representation [919], which could be used to 
reduce the satisficing problem with zero threshold value. To solve satisficing for 
arbitrary threshold values v € Q, we extend existing results on comparators to 
permit arbitrary but fixed threshold values v € Q. An empirical comparison be- 
tween the performance of VISatisfice, VI for optimization, and automata-based 
solution for satisficing shows that the latter outperforms the others in efficiency, 
scalability, and robustness. 

In addition to improved algorithmic performance, we demonstrate that satis- 
ficing solutions have broader applicability than optimal ones (§ B}. We examine 
this with respect to their ability to extend to temporal goals. That is, the prob- 
lem is to find optimal/satisficing solutions that also satisfy a given temporal goal. 
Prior results have shown this to not be possible with optimal solutions [I3]. In 
contrast, we show satisficing extends to temporal goals when the discount factor 
is an integer. This occurs because both satisficing and satisfaction of temporal 
goals are solved via automata-based techniques, which can be easily integrated. 

In summary, this work contributes to showing that satisficing has algorith- 
mic and applicability advantages over optimization in (deterministic) quanti- 
tative games. In particular, we have shown that the automata-based approach 
for satisficing have advantages over approaches in numerical methods like value- 
iteration. This gives yet another evidence in favor of automata-based quantitative 
reasoning and opens up several compelling directions for future work. 


2 Preliminaries 


2.1 Two-player graph games 


Reachability and safety games. Both reachability and safety games are defined 
over the structure G = (V = VoW Vi, Vinit, E, F) [B0]. It consists of a directed 
graph (V, Æ), and a partition (Vo, Vi) of its states V. State Vinit is the initial state 
of the game. The set of successors of state v is designated by vE. For convenience, 
we assume that every state has at least one outgoing edge, i.e, vE 4 Ø for all 
v EV. F C V is a non-empty set of states. F is referred to as accepting and 
rejecting states in reachability and safety games, respectively. 

A play of a game involves two players, denoted by Pp and P}, to create an 
infinite path by moving a token along the transitions as follows: At the beginning, 
the token is at the initial state. If the current position v belongs to V;, then P; 
chooses the successor state from vE. Formally, a play p = vov v2... is an infinite 
sequence of states such that the first state vo = Vinit, and each pair of successive 
states is a transition, i.e., (Uk, Uk+1) € E for all k > 0. A play is winning for 
player Pı in a reachability game if it visits an accepting state, and winning for 
player Po otherwise. The opposite holds in safety games, i.e., a play is winning 
for player P; if it does not visit any rejecting state, and winning for Po otherwise. 

A strategy for a player is a recipe that guides the player on which state to go 
next to based on the history of the play. A strategy is winning for a player P; if 


On Satisficing in Quantitative Games 23 


for all strategies of the opponent player P,_;, the resulting plays are winning for 
P;. To solve a graph game means to determine whether there exists a winning 
strategy for player Pı. Reachability and safety games are solved in O(|V| +|E)). 


Quantitative graph games. A quantitative graph game (or quantitative game, in 
short) is defined over a structure G = (V = VW V1, Vinit, E, Y). V, Vo, Vi, Vinit, E, 
plays and strategies are defined as earlier. Each transition of the game is associ- 
ated with a cost determined by the cost function y : E —> Z. The cost sequence 
of a play p is the sequence of costs woww... such that wg = y((ug, Vk+1)) for 
all i > 0. Given a discount factor d > 1, the cost of play p, denoted wt(p), is the 
discounted sum of its cost sequence, i.e., wt(p) = DS(p,d) = wo + “+ S+.... 


2.2 Automata and formal languages 


Biichi automata. A Btichi automaton is a tuple A = (S, X, 6, sz, F), where 
S is a finite set of states, X is a finite input alphabet, 6 C (S x X x S) is the 
transition relation, state sz E€ S is the initial state, and F C S is the set of 
accepting states [30]. A Biichi automaton is deterministic if for all states s and 
inputs a, |{s’|(s,a,s’) € ô for some s’}| < 1. For a word w = wowi::: E€ LY, a 
run p of w is a sequence of states sos,... s.t. So = Sz, and Ti = (Si, Wi, $i41) E Ô 
for all i. Let inf (p) denote the set of states that occur infinitely often in run p. 
A run pis an accepting run if inf (p) QF 4 0. A word w is an accepting word if it 
has an accepting run. The language of Biichi automaton A is the set of all words 
accepted by A. Languages accepted by Biichi automata are called w-regular. 


Safety and co-safety languages. Let L C X°” be a language over alphabet X. A 
finite word w € X* is a bad prefix for £ if for all infinite words y € XY, x-y ¢ L. 
A language £ is a safety language if every word w ¢ £ has a bad prefix for 
L B]. A co-safety language is the complement of a safety language [19]. Safety 
and co-safety languages that are w-regular are represented by specialized Büchi 
automata called safety and co-safety automata, respectively. 


Comparison language and comparator automata. Given integer bound pz > 0, dis- 
count factor d > 1, and relation R € {<, >, <, >, =, 4} the comparison language 
with upper bound u, relation R, discount factor d is the language of words over 
the alphabet X = {—p,..., u} that accepts A € ©” iff DS(A, d) R 0 holds 5P]. 
The comparator automata with upper bound u, relation R, discount factor d is the 
automaton that accepts the corresponding comparison language [6]. Depending 
on R, these languages are safety or co-safety [9]. A comparison language is said 
to be w-regular if its automaton is a Biichi automaton. Comparison languages 
are w-regular iff the discount factor is an integer [7]. 


3 Satisficing via Optimization 


This section shows that there are no complexity-theoretic benefits to solving the 
satisficing problem via algorithms for the optimization problem. 
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§[3.1|formally defines the satisficing problem and reviews the celebrated value- 
iteration (VI) algorithm for optimization by Zwick and Patterson (ZP). While 
ZP claim without proof that the algorithm runs in pseudo-polynomial time [82], 
its worst-case analysis is absent from literature. This section presents a detailed 
account of the said analysis, and exposes the dependence of VI’s worst-case 
complexity on the discount factor d > 1 and the cost-model for arithmetic oper- 
ations i.e. unit-cost or bit-cost model. The analysis is split into two parts: First, 
§ shows it is sufficient to terminate after a finite-number of iterations. Next, 
8 accounts for the cost of arithmetic operations per iteration to compute VI’s 
worst-case complexity under unit- and bit-cost cost models of arithmetic Finally, 
§ [3.4] presents and analyzes our VI-based algorithm for satisficing VISatisfice. 


3.1 Satisficing and Optimization 


Definition 1 (Satisficing problem). Given a quantitative graph game G and 
a threshold value v € Q, the satisficing problem is to determine whether the 
minimizing (or maximizing) player has a strategy that ensures the cost of all 
resulting plays is strictly or non-strictly lower (or greater) than the threshold v. 


The satisficing problem can clealy be solved by solving the optimization prob- 
lem. The optimal cost of a quantitative game is that value such that the max- 
imizing and minimizing players can guarantee that the cost of plays is at least 
and at most the optimal value, respectively. 


Definition 2 (Optimization problem). Given a quantitative graph game G, 
the optimization problem is to compute the optimal cost from all possible plays 
from the game, under the assumption that the players have opposing objectives 
to maximize and minimize the cost of plays, respectively. 


Seminal work by Zwick and Patterson showed the optimization problem is 
solved by the value-iteration algorithm presented here [32]. Essentially, the al- 
gorithm plays a min-max game between the two players. Let wt,(v) denote 
the optimal cost of a k-length game that begins in state v € V. Then wt,(v) 
can be computed using the following equations: The optimal cost of a 1-length 
game beginning in state v € V is max{y(v,w)|(v,w) € E} if v € Vo and 
min{y(v, w)|(v,w) € E} if v € Vi. Given the optimal-cost of a k-length game, 
the optimal cost of a (k + 1)-length game is computed as follows: 


maz{y(v,w) + 4 - wt, (w)|(v,w) € E} if v € Vo 
wtg41 (v) = ; 1 z 
min{y(v, w) + 3: wtk(w)|(v, w) € E} if v eV 
Let W be the optimal cost. Then, W = liMk—oo Wt (Vinit). [27132]. 


3.2 VI: Number of iterations 


The VI algorithm described above terminates at infinitum. To compute the al- 
gorithms’ worst-case complexity, we establish a linear bound on the number of 
iterations that is sufficient to compute the optimal cost. We also establish a 
matching lower bound, showing that our analysis is tight. 
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Upper bound on number of iterations. The upper bound computation utilizes one 
key result from existing literature: There exist memoryless strategies for both 
players such that the cost of the resulting play is the optimal cost [27]. Then, 
there must exists an optimal play in the form of a simple lasso in the quantitative 
game, where a lasso is a play represented as Ugv1 ... Un(50S2--- Sm)”. We call the 
initial segment vpv1...Un its head, and the cycle segment 5951... Sm its loop. A 
lasso is simple if each state in {vo ...Un, S0, - - - Sm} is distinct. We begin our proof 
by assigning constraints on the optimal cost using the simple lasso structure of 
an optimal play (Corollary [1] and Corollary B). 

Let l = ao... an(bo ... bm)” be the cost sequence of a lasso such that lı = 
ao...an and lg = bo...bm are the cost sequences of the head and the loop, 
respectively. Then the following can be said about DS(l; - 15, d), 


Lemma 1. Let] =1,-(l2)” represent an integer cost sequence of a lasso, where 
lı and ly are the cost sequences of the head and loop of the lasso. Let d = 7 
be the discount factor. Then, DS(l, d) is a rational number with denominator at 
most (pl!2l = ql’2l) x (pll). 


Lemma I] is proven by unrolling DS (l -19,d). Then, the first constraint on 
the optimal cost is as follows: 


Corollary 1. Let G = (V, vinit, E, y) be a quantitative graph game. Let d = ? be 
the discount factor. Then the optimal cost of the game is a rational number with 
denominator at most (plV! — ql!) - (plY!) 


Proof. Recall, there exists a simple lasso that computes the optimal cost. Since a 
simple lasso is of |V|-length at most, the length of its head and loop are at most 
|V| each. So, the expression from Lemma|1]simplifies to (plV| — qll) - (plY 1). 


The second constraint has to do with the minimum non-zero difference be- 
tween the cost of simple lassos: 


Corollary 2. Let G = (V, Vint, E, y) be a quantitative graph game. Let d = 7 
be the discount factor. Then the minimal non-zero difference between the cost of 
simple lassos is a rational with denominator at most (pV D — UYD)? . (p@lVD), 


Proof. Given two rational numbers with denominator at most a, an upper bound 
on the denominator of minimal non-zero difference of these two rational numbers 
is a?. Then, using the result from Corollary [1] we immediately obtain that the 
minimal non-zero difference between the cost of two lassos is a rational number 
with denominator at most (pV) — 0Y D)2 . (p IVD), 


For notational convenience, let boundw = (p!”!—q!”!)-(p!“!) and boundgi¢ = 
Vv ; ; : 
(pV) = ql 1))2 x (p? IVD), Wlog, IV] >11 Since, boundar < Boundy? there is at 
most one rational number with denominator boundw or less in any interval of 
size Bar Thus, if we can identify an interval of size less than B around 
diff OUNC diff 


the optimal cost, then due to Corollary |1| the optimal cost will be the unique 
rational number with denominator boundy or less in this interval. 
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Fig. 1. Sketch of game graph which requires (|V|) iterations 


Thus, the final question is to identify a small enough interval (of size bounday 
or less) such that the optimal cost lies within it. To find an interval around the 
optimal cost, we use a finite-horizon approximation of the optimal cost: 


Lemma 2. Let W be the optimal cost in quantitative game G. Let u > 0 be the 
maximum of absolute value of cost on transitions in G. Then, for allk € N, 
1 u 1 H 


—— SW < wth Vinit) + goa r 


wtp (Vinit) — gel ga 1 dk-1 d—1 


Proof. Since W is the limit of wt,(vinit) as k —> co, W must lie in between the 
minimum and maximum cost possible if the k-length game is extended to an 
infinite-length game. The minimum possible extension would be when the k- 
length game is extended by iterations in which the cost incurred in each round 


is —y. Therefore, the minimum possible value is wt, (Vinit) — a r+ gh Similarly, 


the maximum possible value is wt, (Vinit) + r= PE 


Now that we have an interval around the optimal cost, we can compute the 
number of iterations of VI required to make it smaller than 1/boundgi¢. 


Theorem 1. Let G = (V, Vinit, E, y) be a quantitative graph game. Let u > 0 
be the maximum of absolute value of costs along transitions. The number of 
iterations required by the value-iteration algorithm is 


1. O(|V|) when discount factor d > 2, 
2. o (ese) + ivi) when discount factor 1 < d < 2. 


Proof (Sketch). As discussed in Corollary [12] and Lemma [| the optimal cost is 
the unique rational number with denominator bound or less within the interval 
(wtp (Vinit) — r= sh, wte (Vinit) + r - G4) for a large enough k > 0 such that 
the interval’s size is less than rou" Thus, our task is to determine the value of 
k > 0 such that 2- ae < I holds. The case d > 2 is easy to simplify. 


boundgigf 
The case 1 < d < 2 involves approximations of logarithms of small values. oO 


Lower bound on number of iterations of VI. We establish a matching lower 
bound of (|V|) iterations to show that our analysis is tight. 

Consider the sketch of a quantitative game in Fig |1| Let all states belong 
to the maximizing player. Hence, the optimization problem reduces to searching 
for a path with optimal cost. Now let the loop on the right-hand side (RHS) be 
larger than the loop on the left-hand side (LHS). For carefully chosen values of 
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w and lengths of the loops, one can show that the path for optimal cost of a 
k-length game is along the RHS loop when k is small, but along the LHS loop 
when k is large. This way, the correct maximal value can be obtained only at a 
large value for k. Hence the VI algorithm runs for at least enough iterations that 
the optimal path will be in the LHS loop. By meticulous reverse engineering of 
the size of both loops and the value of w, one can guarantee that k = Q(|V]). 


3.3 Worst-case complexity analysis of VI for optimization 


Finally, we complete the worst-case complexity analysis of VI for optimization. 
We account for the the cost of arithmetic operations since they appear in abun- 
dance in VI. We demonstrate that there are orders-of-magnitude of difference in 
complexity under different models of arithmetic, namely unit-cost and bit-cost. 


Unit-cost model. Under the unit-cost model of arithmetic, all arithmetic opera- 
tions are assumed to take constant time. 


Theorem 2. Let G = (V,vint, E, y) be a quantitative graph game. Let u > 0 
be the maximum of absolute value of costs along transitions. The worst-case 
complexity of the optimization problem under unit-cost model of arithmetic is 


1. O(|V|-|E|) when discount factor d > 2, 
2. o (Repel + |V|- IEI) when discount factor 1 < d < 2. 


Proof. Each iteration takes O(E) cost since every transition is visited once. Thus, 
the complexity is O(|E|) multiplied by the number of iterations (Theorem |1). 


Bit-cost model. Under the bit-cost model, the cost of arithmetic operations de- 
pends on the size of the numerical values. Integers are represented in their bit- 
wise representation. Rational numbers = are represented as a tuple of the bit-wise 
representation of integers r and s. For two integers of length n and m, the cost 
of their addition and multiplication is O(m + n) and O(m - n), respectively. 


Theorem 3. Let G = (V, Vinit, E, y) be a quantitative graph game. Let u > 0 be 
the maximum of absolute value of costs along transitions. Let d = r > 1 be the 
discount factor. The worst-case complexity of the optimization problem under 
the bit-cost model of arithmetic is 


1. O(|V|? -|E| - logp- max{log u, log p}) when d > 2, 
2 


2: o( (ee + Ivi) |E|-logp- max{log 11, log p} ) when1<d< 2. 


Proof (Sketch). Since arithmetic operations incur a cost and the length of repre- 
sentation of intermediate costs increases linearly in each iteration, we can show 
that the cost of conducting the j-th iteration is O(|E| - j - log u - log p). Their 
summation will return the given expressions. 
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Remarks on integer discount factor. Our analysis shows that when the discount 
factor is an integer (d > 2), VI requires O(|V]) iterations. Its worst-case com- 
plexity is, therefore, O(|V|-|E|) and O(|V|?-|E]) under the unit-cost and bit-cost 
models for arithmetic, respectively. From a practical point of view, the bit-cost 
model is more relevant since implementations of VI will use multi-precision li- 
braries to avoid floating-point errors. While one may argue that the upper bounds 
in Theorem |3| could be tightened, they would not improve significantly due to 
the (|V|) lower bound on number of iterations. 


3.4 Satisficing via value-iteration 


We present our first algorithm for the satisficing problem. It is an adaptation of 
VI. However, we see that it does not fare better than VI for optimization. 
VI-based algorithm for satisficing is described as follows: Perform VI for 
optimization. Terminate as soon as one of these occurs: (a). VI completes as many 
iterations from Theorem [1] or (b). The threshold value falls outside the interval 
defined in Lemma |2| Either way, one can tell how the threshold value relates 
to the optimal cost to solve satisficing. Clearly, (a) needs as many iterations as 
optimization; (b) does not reduce the number of iterations since it is inversely 
proportional to the distance between optimal cost and threshold value: 


Theorem 4. Let G = (V, Vint, E, Y) be a quantitative graph game with optimal 
cost W. Let v € Q be the threshold value. Then number of iterations taken by a 
VI-based algorithm for the satisficing problem is min{O(|V]), log mit ifd>2 


and min{O (25 wg vi); log wo z} fl1<d<2. 


Observe that this bound is tight since the lower bounds from optimization 
apply here as well. The worst-case complexity can be completed using similar 
computations from § B-3} Since, the number of iterations is identical to Theo- 
rem|[l] the worst-case complexity will be identical to Theorem 2]and Theorem [B] 
showing no theoretical improvement. However, its implementations may termi- 
nate soon for threshold values far from the optimal but it will retain worst-case 
behavior for ones closer to the optimal. The catch is since the optimal cost is 
unknown apriori, this leads to a highly variable and non-robust performance. 


4 Satisficing via Comparators 


Our second algorithm for satisficing is purely based on automata-methods. While 
this approach operates with integer discount factors only, it runs linearly in 
the size of the quantitative game. This is lower than the number of iterations 
required by VI, let alone the worst-case complexities of VI. This approach reduces 
satisficing to solving a safety or reachability game using comparator automata. 

The intuition is as follows: Given threshold value v € Q and relation R, let 
the satisficing problem be to ensure cost of plays relates to v by R. Then, a play p 
is winning for satisficing with v and R if its cost sequence A satisfies DS(A,d) R 
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v, where d > 1 is the discount factor. When d is an integer and v = 0, this simply 
checks if A is in the safety/co-safety comparator, hence yielding the reduction. 

The caveat is the above applies to v = 0 only. To overcome this, we extend 
the theory of comparators to permit arbitrary threshold values v € Q. We find 
that results from v = 0 transcend to v € Q, and offer compact comparator 
constructions (§ (4-1). These new comparators are then used to reduce satisficing 
to develop an efficient and scalable algorithm (§ (4.2). Finally, to procure a well- 
rounded view of its performance, we conduct an empirical evaluation where we 
see this comparator-based approach outperform the VI approaches § [43] 


4.1 Foundations of comparator automata with threshold v € Q 


This section extends the existing literature on comparators with threshold value 
v = 0 [659] to permit non-zero thresholds. The properties we investigate are of 
safety /co-safety and w-regularity. We begin with formal definitions: 


Definition 3 (Comparison language with threshold v € Q). For an in- 
teger upper bound u > 0, discount factor d > 1, equality or inequality relation 
R € {<,>,<,>,=,4}, and a threshold value v E€ Q the comparison language 
with upper bound yp, relation R, discount factor d and threshold value v is a lan- 
guage of infinite words over the alphabet X = {—p,..., u} that accepts A € XY 
iff DS(A,d) R v holds. 


Definition 4 (Comparator automata with threshold v € Q). For an in- 
teger upper bound u > 0, discount factor d > 1, equality or inequality relation 
Re {<,>,<,>,=,4}, and a threshold value v € Q the comparator automata 
with upper bound p, relation R, discount factor d and threshold value v is an 
automaton that accepts the DS comparison language with upper bound u, relation 
R, discount factor d and threshold value v. 


Safety and co-safety of comparison languages. The primary observation 
is that to determine if DS(A,d) R v holds, it should be sufficient to examine 
finite-length prefixes of A since weights later on get heavily discounted. Thus, 


Theorem 5. Let u > 1 be the integer upper bound. For arbitrary discount factor 
d> 1 and threshold value v E€ Q 


1. Comparison languages are safety languages for relations R € {<,>,=}. 
2. Comparison language are co-safety languages for relations R € {<,>,#4}. 


Proof. The proof is identical to that for threshold value v = 0 from [9]. 


Regularity of comparison languages. Prior work on threshold value v = 0 
shows that a comparator is w-regular iff the discount factor is an integer [7]. We 
show the same result for arbitrary threshold values v € Q. 

First of all, trivially, comparators with arbitrary threshold value are not w- 
regular for non-integer discount factors, since that already holds when v = 0. 
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The rest of this section proves w-regularity with arbitrary threshold val- 
ues for integer discount factors. But first, let us introduce some notations: 
Since v € Q, w.l.o.g. we assume that the it has an n-length representation 
v = v[Oju[1]... v[m] (ulm + lju[m + 2]... vu[n])”. By abuse of notation, we denote 
both the expression v[O]v[1]...v[m](u[m + lju[m + 2]...v[n])* and the value 
DS(v{[0jv[1] ... vim] (wfm + ljulm + 2]... v[n])”, d) by v. 

We will construct a Biichi automaton for the comparison language £< for 
relation <, threshold value v € Q and an integer discount factor. This is sufficient 
to prove w-regularity for all relations since Büchi automata are closed. 

From safety/co-safety of comparison languages, we argue it is sufficient to 
examine the discounted-sum of finite-length weight sequences to know if their 
infinite extensions will be in £<. For instance, if the discounted-sum of a finite- 
length weight-sequence W is very large, W could be a bad-prefix of L<. Similarly, 
if the discounted-sum of a finite-length weight-sequence W is very small then 
for all of its infinite-length bounded extensions Y, DS(W - Y,d) < v. Thus, a 
mathematical characterization of very large and very small would formalize a 
criterion for membership of sequences in £< based on their finite-prefixes. 

To this end, we use the concept of a recoverable gap (or gap value), which is a 
measure of distance of the discounted-sum of a finite-sequence from 0 [12]. The 
recoverable gap of a finite weight-sequences W with discount factor d, denoted 
gap(W, d), is defined as follows: If W = © (the empty sequence), gap(e,d) = 0, 
and gap(W,d) = d'!-1 . DS(W, d) otherwise. Then, eee: alae very 
large and very small in Item [I] and Item [8] respectively, w.r.t. recoverable gaps. 
As for notation, given a sequence A, let AJ. ..i] denote its ¢-length prefix: 


Lemma 3. Let u > 0 be the integer upper bound, d > 1 be the discount factor. 
Let v € Q be the threshold value s.t. v = v[0]...v[m](v[m + 1]... v[n])”. Let W 
be a non-empty, bounded, finite-length weight-sequence. 


1. gap(W — vf: -- |W], d) > 4-DS(v[|W]-- I d)+ 545. iff for all infinite-length, 
bounded extensions Y, DS(W - Y,d) > 

2. gap(W —v[---|W|],d) < 4-DS(v[|W]--- J, d)— 54, iff For all infinite-length, 
bounded extensions Y, DS(W - Y,d) < v 


Proof. We present proof of one direction of Item The others follow simi- 
larly. Let W be s.t for oy infinite-length, bounded Y, DS(W - Y,d) > v 
holds. Then DS(W, d) + zt HE d) > DS(v[---|W|]- v[|W]---],d) implies 
DS(W,d) — DS(v[---|W]],d) > qwrt - (DS(v[|W]---],d) — DS(¥,d)) implies 
gap(W — of---|WI],d) > {(DS(o[|WI---],4) + £4). 


This segues into the state-space of the Biichi automaton. We define the state 
space so that state s represents the gap value s. The idea is that all finite-length 
weight sequences with gap value s will terminate in state s. To assign transition 
between these states, we observe that gap value is defined inductively as follows: 
gap(c, d) = 0 and gap(W -w, d) = d-gap(W, d)+w, where w € {—p,..., u}. Thus 
there is a transition from state s to state t on a € {—p,...,u} ift=d-s+a. 
Since gap(e, d) = 0, state 0 is assigned to be the initial state. 
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The issue with this construction is it has infinite states. To limit that, we 
use Lemma [8] Since Item [I]is a necessary and sufficient criteria for bad prefixes 
of safety language £<, all states with value larger than Item |1] are fused into 
one non-accepting sink. For the same reason, all states with gap value less than 
Item[I]are accepting states. Due to Item 2] all states with value less than Item 2] 
are fused into one accepting sink. Finally, since d is an integer, gap values are 
integral. Thus, there are only finitely many states between Item P]and Item 


Theorem 6. Let > 0 be an integer upper bound, d > 1 an integer discount 
factor, R an equality or inequality relation, and v € Q the threshold value with an 
n-length representation given by v = v[0Ju[1]...v[m] (ulm + 1Jv[m + 2]... v[n])%. 


1. The DS comparator automata for u,d,R,v is w-regular iff d is an integer. 
2. For integer discount factors, the DS comparator is a safety or co-safety au- 
tomaton with O( 5") states. 


Proof. To prove Item [I] we present the construction of an w-regular compara- 
tor automaton for integer upper bound u > 0, integer anu es : is 
inequality relation <, and threshold value v € Q s.t. v = v[0]v[1 


lju[m + 2]... vin]. dated ee (S, 81) 2 ô, F) where: 
For i € {0,...,n}, let U; = 4 - DS(vfi---],d) + 3 A (Lemma 3h Item|1 
For i € {0,...,n}, let L; = 4 - DS(vfi---],d) — ae ne 
— S = U; o Si U {bad, veryGood} where S; = {(s,i)|s € {[Li] +1,..., LU] 


— Initial state sz = (0,0), Accepting states F = S \ {bad} 
— Alphabet © = {-—p,-w4+1,...,—1, u} 
— Transition function 6 C S x X > S where (s,a,t) € ô then: 
1. If s € {bad, veryGood}, then t = s for all a € X 
2. If s is of the form (p, i), and a € X 
(a) If d- p+a-— vi] > |U;], then t = bad 
(b) If d- p+a-— vf] < [Li], then t = veryGood 
(c) If [L;i] <d- p+a-— vf] < LU], 
i. Ifi == n, then t = (d - p +a — vļi]l,m + 1) 
ii. Else, t = (d - p +a — vfi], i + 1) 


We skip proof of correctness as it follows from the above discussion. Observe, A 
is deterministic. It is a safety automaton as all non-accepting states are sinks. 
To prove Item P} observe that since the comparator for < is a determinis- 
tic safety automaton, the comparator for > is obtained by simply flipping the 
accepting and non-accepting states. This is a co-safety automaton of the same 
size. One can argue similarly for the remaining relations. 


4.2 Satisficing via safety and reachability games 


This section describes our comparator-based linear-time algorithm for satisficing 
for integer discount factors. 
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As described earlier, given discount factor d > 1, a play is winning for satis- 
ficing with threshold value v € Q and relation R if its cost sequence A satisfies 
DS(A,d) R v. We now know from Theorem [6] that the winning condition for 
plays can be expressed as a safety or co-safety automaton for any v € Q as long 
as the discount factor is an integer. Therefore, a synchronized product of the 
quantitative game with the safety or co-safety comparator denoting the winning 
condition completes the reduction to a safety or reachability game, respectively. 


Theorem 7. Let G = (V, Unit, E, y) be a quantitative game, d > 1 the integer 
discount factor, R the equality or inequality relation, and v E€ Q the threshold 
value with an n-length representation. Let u > 0 be the maximum of absolute 
values of costs along transitions in G. Then, 


1. The satisficing problem reduces to solving a safety game if R € {<, >} 
2. The satisficing problem reduces to solving a reachability game if R € {<, >} 
3. The satisficing problem is solved in O((\V| + |El) - u- n) time. 


Proof. The first two points use a standard synchronized product argument on the 
following formal reduction (15): Let G = (V = VowV), Vinit, E, y) be a quantitative 
game, d > 1 the integer discount factor, R the equality or inequality relation, 
and v € Q the threshold value with an n-length representation. Let u > 0 be 
the maximum of absolute values of costs along transitions in G. Then, the first 
step is to construct the safety/co-safety comparator A = (S, sr, X,6, F) for u, 
d, R and v. The next is to synchronize the product of G and A over weights to 
construct the game GA = (W = Wọ U W1, so X init, dw, Fw), where 


— W = V x S. In particular, Wo = Vo x S and W, = Vı x S. Since Vo and Vi 
are disjoint, Wo and W, are disjoint too. 

— Let so x init be the initial state of GA. 

— Transition relation ôw = W x W is defined such that transition ((v, s), (v’, s’)) 
€ ôw synchronizes between transitions (v,v’) € ô and (s,a,s’) € dc if 
a = ((v,v’)) is the cost of transition in G. 

— Fw = V x F. The game is a safety game if the comparator is a safety au- 
tomaton and a reachability game if the comparator is a co-safety automaton. 


We need the size of GA to analyze the worst-case complexity. Clearly, GA 
consists of O(|V| - u- n) states. To establish the number of transitions in GA, 
observe that every state (v,s) in GA has the same number of outgoing edges as 
state v in G because the comparator A is deterministic. Since GA has O(u - n) 
copies of every state v € G, there are a total of O(|E|- u- n) transitions in GA. 
Since GA is either a safety or a reachability game, it is solved in linear-time to 
its size. Thus, the overall complexity is O((|V| + |E]) - u-n). 


With respect to the value u, the VI-based solutions are logarithmic in the 
worst case, while comparator-based solution is linear due to the size of the com- 
parator. From a practical perspective, this may not be a limitation since weights 
along transitions can be scaled down. The parameter that cannot be altered is 
the size of the quantitative game. With respect to that, the comparator-based 
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Fig. 2. Cactus plot. u = 5,v = 3. Total Fig. 3. Single counter scalable benchmark. 
benchmarks = 291 u = 5,v = 3. Timeout = 500s. 


solution displays clear superiority. Finally, the comparator-based solution is af- 
fected by n, length of the representation of the threshold value while the VI-based 
solution does not. It is natural to assume that the value of n is small. 


4.3 Implementation and Empirical Evaluation 


The goal of the empirical analysis is to determine whether the practical perfor- 
mance of these algorithms resonate with our theoretical discoveries. 

For an apples-to-apples comparison, we implement three algorithms: (a) 
VlOptimal: Optimization via valuc-iteration, (b)VISatisfice: Satisficing via value- 
iteration, and (c). CompSatisfice: Satisficing via comparators. All tools have been 
implemented in C++. To avoid floating-point errors in VIOptimal and VISatisfice, 
the tools invoke the open-source GMP (GNU Multi-Precision) [2]. Since all arith- 
metic operations in CompSatisfice are integral only, it does not use GMP. 

To avoid completely randomized benchmarks, we create ~290 benchmarks 
from LTL; benchmark suite (29|. The state-of-the-art LTLs-to-automaton tool 
Lisa [8] is used to convert LTLs to (non-quantitative) graph games. Weights are 
randomly assigned to transitions. The number of states in our benchmarks range 
from 3 to 50000+. Discount factor d = 2, threshold v € [0 — 10]. Experiments 
were run on 8 CPU cores at 2.4GHz, 16GB RAM on a 64-bit Linux machine. 


Observations and Inferences Overall, we see that VISatisfice is efficient and 
scalable, and exhibits steady and predictable performance. 


CompSatisfice outperforms VIOptimal in both runtime and number of bench- 
marks solved, as shown in Fig B] It is crucial to note that all benchmarks solved 
by VlOptimal had fewer than 200 states. In contrast, CompSatisfice solves much 
larger benchmarks with 3-50000+ number of states. 

To test scalability, we compared both tools on a set of scalable benchmarks. 
For integer parameter i > 0, the i-th scalable benchmark has 3 - 2’ states. Fig [B] 
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Fig. 4. Robustness. Fix benchmark, vary v. u = 5. Timeout = 500s. 


plots number-of-states to runtime in log-log scale. Therefore, the slope of the 
straight line will indicate the degree of polynomial (in practice). It shows us 
that CompSatisfice exhibits linear behavior (slope ~1), whereas VlOptimal is 
much more expensive (slope >> 1) even in practice. 


CompSatisfice is more robust than VISatisfice. We compare CompSatisfice and 
VISatisfice as the threshold value changes. This experiment is chosen due to 
Theorem [4] which proves that VISatisfice is non-robust. As shown in Fig [4] the 
variance in performance of VISatisfice is very high. The appearance of peak close 
to the optimal value is an empirical demonstration of Theorem [4] On that other 
hand, CompSatisfice stays steady in performance owning to its low complexity. 


5 Adding Temporally Extended Goals 


Having witnessed algorithmic improvements of comparator-based satisficing over 
VI-based algorithms, we now shift focus to the question of applicability. While 
this section examines this with respect to the ability to extend to temporal 
goals, this discussion highlights a core strength of comparator-based reasoning 
in satisficing and shows its promise in a broader variety of problems. 

The problem of extending optimal/satisficing solutions with a temporal goal 
is to determine whether there exists an optimal/satisficing solution that also 
satisfies a given temporal goal. Formally, given a quantitative game G, a labeling 
function £ : V — 24? which assigns states V of G to atomic propositions from 
the set AP, and a temporal goal y over AP, we say a play p = vot... satisfies 
y if its proposition sequence given by L(vo)L(v1)... satisfies the formula y. 
Then to solve optimization/satisficing with a temporal goal is to determine if 
there exists a solutions that is optimal/satisficing and also satisfies the temporal 
goal along resulting plays. Prior work has proven that the optimization problem 
cannot be extended to temporal goals [I3] unless the temporal goals are very 
simple safety properties [10/31]. In contrast, our comparator-based solution for 
satisficing can naturally be extended to temporal goals, in fact to all w-regular 
properties, owing to its automata-based underpinnings, as shown below: 
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Theorem 8. Let G a quantitative game with state set V, L : V — 24? be a 
labeling function over set of atomic propositions AP, and p be a temporal goal 
over AP and Ay be its equivalent deterministic parity automaton. Let d > 1 be 
an integer discount factor, u be the maximum of the absolute values of costs along 
transitions, and v E€ Q be the threshold value with an n-length representation. 
Then, solving satisficing with temporal goals reduces to solving a parity game of 
size linear in |V|, u, n and |A]. 


Proof. The reduction involves two steps of synchronized products. The first re- 
duces the satisficing problem to a safety/reachability game while preserving 
the labelling function. The second synchronization product is between the safe- 
ty/reachability game with the DPA Ap. These will synchronize on the atomic 
propositions in the labeling function and DPA transitions, respectively. There- 
fore, resulting parity game will be linear in |V|, and n, and |A,|. 


Broadly speaking, our ability to solve satisficing via automata-based meth- 
ods is a key feature as it propels a seamless integration of quantitative prop- 
erties (threshold bounds) with qualitative properties, as both are grounded in 
automata-based methods. VI-based solutions are inhibited to do so since numeri- 
cal methods are known to not combine well with automata-based methods which 
are so prominent with qualitative reasoning [5]20]. This key feature could be ex- 
ploited in several other problems to show further benefits of comparator-based 
satisficing over optimization and VI-based methods. 


6 Concluding remarks 


This work introduces the satisficing problem for quantitative games with the 
discounted-sum cost model. When the discount factor is an integer, we present 
a comparator-based solution for satisficing, which exhibits algorithmic improve- 
ments — better worst-case complexity and efficient, scalable, and robust per- 
formance — as well as broader applicability over traditional solutions based on 
numerical approaches for satisficing and optimization. Other technical contri- 
butions include the presentation of the missing proof of value-iteration for opti- 
mization and the extension of comparator automata to enable direct comparison 
to arbitrary threshold values as opposed to zero threshold value only. 

An undercurrent of our comparator-based approach for satisficing is that it 
offers an automata-based replacement to traditional numerical methods. By do- 
ing so, it paves a way to combine quantitative and qualitative reasoning without 
compromising on theoretical guarantees or even performance. This motivates 
tackling more challenging problems in this area, such as more complex environ- 
ments, variability in information availability, and their combinations. 
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Abstract. It is well-known that the winning region of a parity game 
with n nodes and k priorities can be computed as a k-nested fixpoint 
of a suitable function; straightforward computation of this nested fix- 
point requires O(n?) iterations of the function. Calude et al.’s recent 
quasipolynomial-time parity game solving algorithm essentially shows 
how to compute the same fixpoint in only quasipolynomially many itera- 
tions by reducing parity games to quasipolynomially sized safety games. 
Universal graphs have been used to modularize this transformation of 
parity games to equivalent safety games that are obtained by combin- 
ing the original game with a universal graph. We show that this ap- 
proach naturally generalizes to the computation of solutions of systems 
of any fixpoint equations over finite lattices; hence, the solution of fix- 
point equation systems can be computed by quasipolynomially many 
iterations of the equations. We present applications to modal fixpoint 
logics and games beyond relational semantics. For instance, the model 
checking problems for the energy p-calculus, finite latticed u-calculi, and 
the graded and the (two-valued) probabilistic -calculus — with numbers 
coded in binary — can be solved via nested fixpoints of functions that 
differ substantially from the function for parity games but still can be 
computed in quasipolynomial time; our result hence implies that model 
checking for these p-calculi is in QP. Moreover, we improve the exponent 
in known exponential bounds on satisfiability checking. 


games, energy games, j-calculus 


1 Introduction 


Fixpoints are pervasive in computer science, governing large portions of recur- 
sion theory, concurrency theory, logic, and game theory. One famous example 
are parity games, which are central, e.g., to networks and infinite processes [5], 
tree automata [43], and j-calculus model checking [17]. Winning regions in parity 
games can be expressed as nested fixpoints of particular set functions (e.g. [8,16]). 
In recent breakthrough work on the solution of parity games in quasipolynomial 
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time, Calude et al. [9] essentially show how to compute this particular fixpoint 
in quasipolynomial time, that is, in time 20(0°87)°) for some constant c. Subse- 
quently, it has been shown [13,14,28] that universal graphs (that is, even graphs 
into which every even graph of a certain size embeds by a graph morphism) can 
be used to transform parity games to equivalent safety games obtained by pairing 
the original game with a universal graph; the size of these safety games is deter- 
mined by the size of the employed universal graphs and it has been shown [13,14] 
that there are universal graphs of quasipolynomial size. This yields a uniform 
algorithm for solving parity games to which all currently known quasipolynomial 
algorithms for parity games have been shown to instantiate using appropriately 
defined universal graphs [13, 14]. 

Briefly, our contribution in the present work is to show that the method of 
using universal graphs to solve parity games generalizes to the computation of 
nested fixpoints of arbitrary functions over finite lattices. That is, given functions 
fi: P(UU)**1 — P(U), 0 <i< k on a finite lattice U, we give an algorithm that 
uses universal graphs to compute the solutions of systems of equations 


Xi =n; fi(Xo,-.+, Xx) 0<i<k 


where 7; = GFP (greatest fixpoint) or 7; = LFP (least fixpoint). Since there are 
universal graphs of quasipolynomial size, the algorithm requires only quasipoly- 
nomially many iterations of the functions f; and hence runs in quasipolynomial 
time, provided that all f; are computable in quasipolynomial time. While it 
seems plausible that this time bound may also be obtained by translating equa- 
tion systems to equivalent standard parity games by emulating Turing machines 
to encode the functions f; as Boolean circuits (leading to many additional states 
but avoiding exponential blowup during the process), we emphasize that the 
main point of our result is not so much the ensuing time bound but rather the 
insight that universal graphs and hence many algorithms for parity games can 
be used on a much more general level which yields a precise (and relatively low) 
quasipolynomial bound on the number of function calls that are required to 
obtain solutions of fixpoint equation systems. 

In more detail, the method of Calude et al. can be described as annotating 
nodes of a parity game with histories of quasipolynomial size and then solving 
this annotated game, but with a safety winning condition instead of the much 
more involved parity winning condition. It has been shown that these histories 
can be seen as nodes in universal graphs, in a more general reduction of parity 
games to safety games in which nodes from the parity game are annotated with 
nodes from a universal graph. This method has also been described as pairing 
separating automata with safety games [14]. It has been shown [13,14] that there 
are exponentially sized universal graphs (essentially yielding the basis for e.g. the 
fixpoint iteration algorithm [8] or the small progress measures algorithm [27]) and 
quasipolynomially sized universal graphs (corresponding, e.g., to the succinct 
progress measure algorithm [28], or to the recent quasipolynomial variant of 
Zielonka’s algorithm [38}). 

Hasuo et al. [22], and more generally, Baldan et al. [4] show that nested 
fixpoints in highly general settings can be computed by a technique based on 
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progress measures, implicitly using exponentially sized universal graphs, obtain- 
ing an exponential bound on the number of iterations. Our technique is based 
on showing that one can make explicit use of universal graphs, correspondingly 
obtaining a quasipolynomial upper bound on the number of iterations. In both 
cases, computation of the nested fixpoint is reduced to a single (least or greatest 
depending on exact formulation) fixpoint of a function that extends the given 
set function to keep track of the exponential and quasipolynomial histories, re- 
spectively, in analogy to the previous reduction of parity games to safety games. 
Our central result can then be phrased as saying that the method of trans- 
forming parity conditions to safety conditions using universal graphs generalizes 
from solving parity games to solving systems of equations that use arbitrary 
functions over finite lattices. We use fixpoint games [4,42] to obtain the cru- 
cial result that the solutions of equation systems have history-free witnesses, 
in analogy to history-freeness of winning strategies in parity games. These fix- 
point games have exponential size but we show how to extract polynomial-size 
witnesses for winning strategies of Eloise, and use these witnesses to show that 
any node won by Eloise is also won in the safety game obtained by a universal 
graph. For the backwards direction, we show that a witness for satisfaction of 
the safety condition regarding the universal graph induces a winning strategy 
in the fixpoint game. This proves that universal graphs can be used to compute 
nested fixpoints of arbitrary functions over finite lattices and hence yields the 
quasipolynomial upper bound for computation of nested fixpoints. Moreover, we 
present a progress measure algorithm that uses the nodes of a quasipolynomial 
universal graph to measure progress and that can be used to efficiently compute 
nested fixpoints of arbitrary functions over finite lattices. 


As an immediate application of these results, we improve known deterministic 
algorithms for solving energy parity games [10], that is, parity games in which 
edges have additional integer weights and for which the winning condition is 
a combined parity condition and a (quantitative) positivity condition on the 
sum of the accumulated weights. Our results also show that the model checking 
problem for the associated energy p-calculus [2] is in QP. In a similar fashion, 
we obtain quasipolynomial algorithms for model checking in latticed ju-calculi [7] 
in which the truth values of formulae are computed over arbitrary finite lattices, 
and for solving associated latticed parity games [30]. 


Furthermore, our results improve generic upper complexity bounds on model 
checking and satisfiability checking in the coalgebraic -calculus [12], which 
serves as a generic framework for fixpoint logics beyond relational semantics. 
Well-known instances of the coalgebraic p-calculus include the alternating- 
time p-calculus [1], the graded p-calculus [32], the (two-valued) probabilistic 
p-calculus [12,34], and the monotone p-calculus [18] (the ambient fixpoint logic 
of concurrent dynamic logic CPDL [39] and Parikh’s game logic [37]). This level 
of generality is achieved by abstracting system types as set functors and sys- 
tems as coalgebras for the given functor following the paradigm of universal 
coalgebra [40]. It was previously shown [24] that the model checking problem 
for coalgebraic p-calculi reduces to the computation of a nested fixpoint. This 
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fixpoint may be seen as a coalgebraic generalization of a parity game winning 
region but can be literally phrased in terms of small standard parity games 
(implying quasipolynomial run time) only in restricted cases. Our results show 
that the relevant nested fixpoint can be computed in quasipolynomial time in 
all cases of interest. Notably, we thus obtain as new specific upper bounds that 
even under binary coding of numbers, the model checking problems of both the 
graded p-calculus and the probabilistic yi-calculus are in QP, even when the 
syntax is extended to allow for (monotone) polynomial inequalities. 

Similarly, the satisfiability problem of the coalgebraic p-calculus has been 
reduced to a computation of a nested fixpoint [25], and our present results imply 
a marked improvement in the exponent of the associated exponential time bound. 
Specifically, the nesting depth of the relevant fixpoint is exponentially smaller 
than the basis of the lattice. Our results imply that this fixpoint is computable in 
polynomial time so that the complexity of satisfiability checking in coalgebraic 
p-calculi drops from 2O(n*k* logn) to QO(mklogn) for formulae of size n and with 
alternation depth k. 


Related Work The quasipolynomial bound on parity game solving has in the 
meantime been realized by a number of alternative algorithms. For instance, Ju- 
rdzinski and Lazic [28] use succinct progress measures to improve to quasilinear 
(instead of quasipolynomial) space; Fearnley et al. [19] similarly achieve quasilin- 
ear space. Lehtinen [33] and Boker and Lehtinen [6] present a quasipolynomial 
algorithm using register games. Parys [38] improves Zielonka’s algorithm [43] 
to run in quasipolynomial time. In particular the last algorithm is of interest 
as an additional candidate for generalization to nested fixpoints, due to the 
known good performance of Zielonka’s algorithm in practice. Daviaud et al. [15] 
generalize quasipolynomial-time parity game solving by providing a pseudo- 
quasipolynomial algorithm for mean-payoff parity games. On the other hand, 
Czerwinski et al. [14] give a quasipolynomial lower bound on universal trees, im- 
plying a barrier for prospective polynomial-time parity game solving algorithms. 
Chatterjee et al. [11] describe a quasipolynomial time set-based symbolic algo- 
rithm for parity game solving that is parametric in a lift function that determines 
how ranks of nodes depend on the ranks of their successors, and thereby unifies 
the complexity and correctness analysis of various parity game algorithms. Al- 
though part of the parity game structure is encapsulated in a set operator CPre, 
the development is tied to standard parity games, e.g. in the definition of the 
best function, which picks minimal or maximal ranks of successors depending on 
whether a node belongs to Abelard or Eloise. 

Early work on the computation of unrestricted nested fixpoints has shown 
that greatest fixpoints require less effort in the fixpoint iteration algorithm, which 
can hence be optimized to compute nested fixpoints with just O(n?) calls of 
the functions at hand [35,41], improving the previously known (straightforward) 
bound O(n*); here, n denotes the size of the basis of the lattice and k the number 
of fixpoint operators. Recent progress in the field has established the above- 
mentioned approaches using progress measures [22] and fixpoint games [4] in 
general settings, both with a view to applications in coalgebraic model checking 
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like in the present paper. In comparison to the present work, the respective 
bounds on the required number of function iterations in the above unrestricted 
approaches all are exponential. 

A preprint of our present results, specifically the quasipolynomial upper 
bound on function iteration in fixpoint computation, has been available as an 
arXiv preprint for some time [23]. Subsequent to this preprint, Arnold, Niwin- 
ski and Parys [3] have improved the actual run time by reducing the overhead 
incurred per iteration (and they give a form of quasipolynomial lower bound for 
universal-tree-based algorithms), working (like [23]) in the less general setting of 
directly nested fixpoints over powerset lattices; we show in Section 6 how such 
an improvement can be incorporated also in our lattice-based algorithm. 


2 Notation and Preliminaries 


Let U and V be sets, and let R C U x U be a binary relation on U. For 
u € U, we then put R(u) := {v € U | (u,v) € R}. We put [k] = {0,..., k} for 
k € N. Labelled graphs G = (W, R) consist of a set W together with a relation 
RC W x AxW where A is some set of labels; typically, we use A = [k] 
for some k € N. An R-path in a labelled graph is a finite or infinite sequence 
Vo, Q0, V1, Q1, V2... (ending in a node from W if finite) such that (vi, a;i, 0:11) € R 
for all i. For v € W and a E€ A, we put R,(v) = {w € W | (v,a,w) € R} and 
sometimes write |G| to refer to |W]. As usual, we write U* and U“ for the sets of 
finite sequences or infinite sequences, respectively, of elements of U. The domain 
dom(f) of a partial function f : U — V is the set of elements on which f is 
defined. As usual, the (forward) image of A’ C A under a function f : A > B 
is f[A’] = {b € B | da € A’. f(a) = b} and the preimage f—1[B’] of B' C B 
under f is defined by f~![B’] = {a € A | Ib € BY’. f(a) = b}. Projections 
Tj: Ay X... X Am > A; for 1 < j < m are given by 7;(a1,...,@m) = aj. We 
often regard (finite) sequences T = uo, u1,... E U* UU” of elements of U as 
partial functions of type N — U and then write r(i) to denote the element ui, 
for i € dom(r). For r € U* UU", we define the set Inf(r) = {u € U | Vi > 
0.47 > i.7(j) = u} of elements that occur infinitely often in 7 (so Inf(7) = 0 
for r € U*). An infinite R-path vo, po, v1, p1, ... in a labelled graph G = (W, R) 
with labels from [k] is even if max(Inf(po,p1,...)) is even, and G is even if every 
infinite R-path in G is even. We write P(U) for the powerset of U, and U™ for 
the m-fold Cartesian product U x --- x U. 


Finite Lattices and Fixpoints A finite lattice (L,C) (often written just as L) 
consists of a non-empty finite set L together with a partial order E on L, such 
that there is, for all subsets X C L, a join | |X and a meet [] X. The least and 
greatest elements of L are defined as T = | | Ø and element T = []0, respectively. 
A set Bz C L such that 1 = | {b € Bz |b E l} is a basis of L. Given a finite 
lattice L, a function g : LE + L is monotone if g(Vi,..., Ve) E g(W1,..., We) 
whenever V; C W; for all 1 < i < k. For monotone f : L > L, we put 


GFP f =| KVE L| V E f(V)} LFP f =[ HV E L| (V) E V}, 
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which, by the Knaster-Tarski fixpoint theorem, are the greatest and the least 
fixpoint of f, respectively. Furthermore, we define f°(V) = V and f™*t!(V) = 
f(f™(V)) for m > 0, V E L; since L is finite, we have GFP f = f(T) and 
LFP f = f"(1) by Kleene’s fixpoint theorem. Given a finite set U and a natural 
number n, (nY, E) is a finite lattice, where nY = { f : U > [n — 1]} denotes the 
function space from U to [n—1] and f E g if and only if for all u € U, f(u) < g(u). 
For n = 2, we obtain the powerset lattice (27, C), also denoted by P(U), with 
least and greatest elements Ý and U, respectively, and basis {{u} | u € U}. 


Parity games A parity game (V, E, 2) consists of a set of nodes V, a left-total 
relation E C V x V of moves encoding the rules of the game, and a priority 
function N : V — N, which assigns priorities Q(v) € N to nodes v € V. 
Moreover, each node belongs to exactly one of the two players Eloise or Abelard, 
where we denote the set of Eloise’s nodes by V3 and that of Abelard’s nodes 
by W. A play p € V“ is an infinite sequence of nodes that follows the rules 
of the game, that is, such that for all ¿i > 0, we have (p(t), pli + 1)) € E. We 
say that an infinite play p = vo, v1,... is even if the largest priority that occurs 
infinitely often in it (i.e. max(Inf({2 o p))) is even, and odd otherwise, and call 
this property the parity of p. Player Eloise wins exactly the even plays and 
player Abelard wins all other plays. A (history-free) Eloise-strategy s : V3 + V 
is a partial function that assigns single moves s(x) to Eloise-nodes x € dom(s). 
Given an Eloise-strategy s, a play p is an s-play if for all i € dom(p) such that 
pli) € V3, we have p(t +1) = s(p(i)). An Eloise-strategy wins a node v € V if 
Eloise wins all s-plays that start at v. We have a dual notion of Abelard-strategies; 
solving a parity game consists in computing the winning regions wing and winy 
of the two players, that is, the sets of states that they respectively win by some 
strategy. 

It is known that solving parity games is in NP N CONP (and, more specifi- 
cally, in UP N co-UP). Recently it has also been shown [9] that for parity games 
with n nodes and k priorities, wing and winy can be computed in quasipolyno- 
mial time O(n!°&*+®), Another crucial property of parity games is that they are 
history-free determined [21], that is, that every node in a parity game is won by 
exactly one of the two players and then there is a history-free strategy for the 
respective player that wins the node. 


3 Systems of Fixpoint Equations 


We now introduce our central notion, that is, systems of fixpoint equations over 
a finite lattice. Throughout, we fix a finite lattice (Z,C) and a basis Bz of L 
such that L ¢ Bz, and k + 1 monotone functions f; : LP L O0<i<k. 


Definition 3.1. A system of equations consists of k + 1 equations of the form 


Ki =n, fi(Xo,.-., Xk) 
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where n; € {LFP, GFP}, briefly referred to as f. For a partial valuation o : [k] = 
L, we inductively define 


[X17 = niXi fr, 
where the function ff is given by 
JECA) = fil[ Xo]? ,--., [Xi], A, ev(o’, i + 1), ..., ev(o’, k)) 


for A € L, where (oļi + A])(j) = o(7) for j # i and (afi => AJ)(t) = A, 
o’ = oļi + Aj and where ev(o,7) = o(j) if j € dom(c) and ev(o,j) = [X,]” 
otherwise (the latter clause handles free variables). Then, the solution of the 
system of equations is [X;]© where e : [k] — L denotes the empty valuation 
(i.e. dom(e) = Ø). Similarly, we can obtain solutions for the other components 
as [X;]" for 0 < i < k; we drop the valuation index if no confusion arises, and 
sometimes write |X;] p to make the equation system f explicit. We denote by Ef 
the solution [X+] for the canonical system of equations of the particular shape 


Xi =n, Xi-1 Xo =crP fo(Xo,.--,Xk); 


where 0 < i < k, m = LFP for odd i and 7; = GFP for even i. 


Example 3.2. (1) Parity games and the modal y-calculus: Let (V, E, Q) be a 
parity game with priorities 0 to k, take L = P(V), and consider the canonical 
system of fixpoint equations E/? for the function fa: P(V)*t! > P(V) given by 


fa(Vo, 3 < Vk) ={v EW | E(v) NM Vaw) £ Ø} U {v EWU | E(v) (s Vaw) } 


for (Vo,...,Ve) € P(V)*+?. It is well known that wing = E®, i.e. parity games 
can be solved by solving fixpoint equation systems. Intuitively, v € fa(Vo,..., Ve) 
iff Eloise can enforce that some node in Vow) is reached in the next step. The 
nested fixpoint expressed by Ef? (in which least (greatest) fixpoints correspond 
to odd (even) priorities) is constructed in such a way that Eloise only has to rely 
infinitely often on an argument V; for odd ¿ if she can also ensure that some 
argument Vj for j > 7 is used infinitely often. 

Model checking for the modal u-calculus [29] and solving parity games are 
linear-time equivalent problems. Formulae of the p-calculus are evaluated over 
Kripke frames (U, R) with set of states U and transition relation R. Formulae 
@ of the p-calculus can be directly represented as equation systems over the 
lattice P(U) by recursively translating ¢ to equations, mapping subformulae 
pX;.w(Xo,...,X~) and vX;.Y(Xo,..., Xk) to equations 


Xi =p w(Xo,.--, Xk) Xj =p x(Xo,---, Xk); 


and interpreting the modalities © and O by functions 


fo(X) = {ue U | Ru) n X #0} H(X) = {u €U | Ru) CX} 
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The solution of the resulting system of equations then is the truth set of the 
formula @, that is, model checking for the model p-calculus reduces to solving 
fixpoint equation systems. Furthermore, satisfiability checking for the modal u- 
calculus can be reduced to solving so-called satisfiability games [20], that is, 
parity games that are played over the set of states of a determinized parity 
automaton. These satisfiability games can be expressed as systems of fixpoint 
equations, where the functions track transitions in the determinized automaton. 


(2) Energy parity games and the energy p-calculus: Energy parity games [10] are 
two-player games played over weighted game arenas (V, E, w, 2), where w : E > 
Z assigns integer weights to edges. The winning condition is the combination 
of a parity condition with a (quantitative) positivity condition on the sum of 
the accumulated weights. It has been shown [2,10], that b = n-d-W isa 
sufficient upper bound on energy level accumulations in energy parity games 
with n nodes, k priorities and maximum absolute weight W. We define a function 
fS : ((b+1)”)**1 — (b+1)” over the finite lattice (b6+1)” (whose elements are 
functions from V to the set {0,...,b-+1}) by putting 


min(en(v,Va))) ifv € Va 
max(en(v,Vaiv))) ifv € W, 


So.. Vo) -| 


for (Vo, .-., Vk) E€ ((b + 1)”)**! and v € V, using en(v,c) as abbreviation for 


en(v,o) ={n € {0,...,b} | du € E(v).n = max{0, o (u) — w(v,u)}} U 
{b +1 | du € E(v).o(u)— w(v,u) >b or o(u) > b}, 


where 0: V —> {0,...,b +1}. Then it follows from the results of [2] that player 
Eloise wins a node v in the energy parity game with minimal initial credit c < b+1 
if (E3) (v) = c, that is, if the solution of the canonical equation system over f§ 
maps v to a value c that is at most b. 

The energy p-calculus [2] is the fixpoint logic that corresponds to energy par- 
ity games. Its formulae are evaluated over weighted game structures and involve 
operators Og¢ and Ogo that are evaluated depending on the energy function 
lọ] : V — {0,...,b + 1} that is obtained by first evaluating the argument for- 
mula ¢. The semantics of the diamond operator then is an energy function that 
assigns, to each state v, the least energy value c € {0,...,b+1} such that there 
is a move from v to some node u such that the credit c suffices to take the 
move from v to u and retain an energy level of at least [¢](u). Formulae can be 
translated to equation systems over the finite lattice (b + 1)”, where the func- 
tions for modal operators are defined according to their semantics as presented 
in [2]. Solving these equation systems then amounts to model checking energy 
p-calculus formulae over weighted game structures. 


(3) Latticed y-calculi: In latticed p-calculi [7], formulae are evaluated over com- 
plete lattices L rather than the powerset lattice; for finite lattices L, formulae of 
latticed -calculi hence can be translated to fixpoint equation systems over L, so 
that model checking reduces to solving equation systems. An associated latticed 
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variant of games has been introduced in [30] and for finite lattices L, solving 
latticed parity games over L reduces to solving equation systems over L. 


(4) The coalgebraic -calculus and coalgebraic parity games: The coalgebraic 
-calculus [12] supports generalized modal branching types by using predicate 
liftings to interpret formulae over T-coalgebras, that is, over structures whose 
transition type is specified by an endofunctor T on the category of sets. For 
instance the functors T = P, T = D and T = G map sets X to their pow- 
erset P(X), the set of probability distributions D(X) = {f : X — [0,...,1]} 
over X, and to the set of multisets G(X) = {f : X —> N} over X, respectively. 
The corresponding T-coalgebras then are Kripke frames (for T = P), Markov 
chains (for T = D) and graded transition systems (for T = G), respectively. In- 
stances of the coalgebraic u-calculus comprise, e.g. the two-valued probabilistic 
-calculus [12,34] with modalities >,@ for p € [0,...,1], expressing ‘the next 
state satisfies ọ with probability more than p’; the graded p-calculus [32] with 
modalities 0,@ for g € N, expressing ‘there are more than ¢ successor states 
that satisfy ø’; or the alternating-time p-calculus [1] that is interpreted over 
concurrent game frames and uses modalities (D)@ for finite D C N (encoding a 
coalition) that express that ‘coalition D has a joint strategy to enforce g’. 

It has been shown in previous work [24] that model checking for coalgebraic 
p-calculi against coalgebras with state space U reduces to solving a canonical 
fixpoint equation system over the powerset lattice P(U), where the involved func- 
tion interprets modal operators using predicate liftings, as described in [12, 24]. 
This canonical equation system can alternatively be seen as the winning region 
of Eloise in coalgebraic parity games, a highly general variant of parity games 
where the game structure is a coalgebra and nodes are annotated with modal- 
ities. Examples include two-valued probabilistic parity games and graded parity 
games in which nodes and edges are annotated with probabilities or grades, re- 
spectively. In order to win a node v, player Eloise then has to have a strategy 
that picks a set of moves to nodes that in turn are all won by Eloise, and such 
that the joint probability (joint grade) of the picked moves is greater than the 
probability (grade) that is assigned to v. It is known that solving coalgebraic 
parity games reduces to solving fixpoint equation systems [24]. 

Furthermore, the satisfiability problem of the coalgebraic p-calculus has 
been reduced to solving canonical fixpoint equations systems over lattices P(U), 
where U is the state set of a determinized parity automaton and where the inner- 
most equation checks for joint one-step satisfiability of sets of coalgebraic modal- 
ities [25]. By interpreting coalgebraic formulae over finite lattices d” rather than 
over powerset lattices, one obtains the finite-valued coalgebraic -calculus (with 
values {0,...,d}), which has the finite-valued probabilistic -calculus (e.g. [36]) 
as an instance. Model checking for the finite-valued probabilistic u-calculus hence 
reduces to solving equation systems over the finite lattice dlUl, where {0,...,d} 
encodes a finite set of probabilities. 
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4 Fixpoint Games and History-free Witnesses 


We instantiate the existing notion of fixpoint games [4,42], which characterize 
solutions of equation systems, to our setting (that is, to finite lattices), and then 
use these games as a technical tool to establish our crucial notion of history- 
freeness for systems of fixpoint equations. 


Definition 4.1 (Fixpoint games). Let X; =n, fi(Xo,...,Xn),0<i<k, be 
a system of fixpoint equations. The associated fixpoint game is a parity game 
(V, E, 2) with set of nodes V = (By x [k]) U L**1, where nodes from Bz x [k] 
belong to player Eloise and nodes from L**+! belong to player Abelard. For nodes 
(u,i) € Br x [k], we put 


E(u, 7%) = {(Uo, - : ., Ux) € Le | u C fi(Uo, Aik ., Ux), 


and for nodes (Uo,...,Ux) € L*+!, we put 


E(Up,...,Uk) = {(u,4) € By x [k] | u E Ui}. 


The alternation depth ad(i) of an equation X; =,, fi(Xo,...,X1) is defined as 
ad! if n; = u and as adf if n; = v, where ad!', ad? are recursively defined by 


adf 4 i> 0, hi-1 =H adf +1 i>0,7-1= hu 
adi = adj1 +1 i>0,q-1 =v ad = ad; i > 0, Ni-1 =V 
1 i=0 0 i=0 


for 0 < i < k. The priority function Q:V — [ad(k)] then is defined by 2(u,i) = 
ad(z) and 2(Uo,...,U,) = 0. 


Remark 4.2. In [4], an alternative priority function 2’: V — [2k + 1] with 


O(u,i) = 2i if ni = GFP 
2i+1 ifm,=—LFP 


and §2'(Upo,...,U%) = 0 is used. Since ad(i) is even if and only if ņ; is even, and 
moreover ad(i) < ad(j) for i < j, and i < j whenever ad(i) < ad(j/), it is easy to 
see that 2 and Q’ in fact assign identical parities to all plays. In the following, 
we will use the more economic parity function 2 so that fixpoint games have 
only d := ad(k) < k priorities. 


We import the associated characterization theorem [4, Theorem 4.8]: 


Theorem 4.3 ([4]). We have u E [Xi]y if and only if Eloise wins the node 
(u,2) in the fixpoint game for the given system f of equations. 


Remark 4.4. While this shows that parity game solving can be used to solve 
equation systems, the size of fixpoint games is exponential in |Bz|, so they do 
not directly yield a quasipolynomial algorithm for solving equation systems. 
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Next we define our notion of history-freeness for systems of fixpoint equations. 


Definition 4.5 (History-free witness). A history-free witness for u E [Xj] 
is an even labelled graph (W, R) with labels from [|d] such that W C Bz x 
[d], (u,i) € W, and for all (v,p) € W, we have v E f,(Uo,...,Ux) where 
U; = Lm [Raay)(v, p)] for 0 < j < k, noting that Ragj)(v,p) GC W so that 
Ti [Raag (v, p)] G By, and U; EL. 


In analogy to history-free strategies for parity games, history-free witnesses as- 
sign tuples (Ri(v,p),...,Ra(v,p)) of sets R;(v, p) C W to pairs (v, p) € W with- 
out relying on a history of previously visited pairs. We have |W| < (d + 1)|Bz| 
and |R| < (d+1)|W/?, that is, the size of history-free witnesses is polynomial in 
|B,|. Crucially, history-free witnesses always exist: 


Lemma 4.6. For allu € By andi € |k], we have 


u C [Xi], if and only if there is a history-free witness for u E [Xi] /. 


Proof. In one direction, we have u E [X;]; so that Eloise wins the node (u, i) 
in the according fixpoint game by Lemma 4.3. Let s be a corresponding history- 
free winning strategy (such strategies always exists, see e.g. [21]). We inductively 
construct a witness for u E LX;];, starting at (u, i). When at (v,p) € Br x [k] 
with s(v, p) = (Uo,...,Ux), we put Ri(v,p) = U;jaagj=: (U; X {7}) for OS i<d 
and hence have ad(j) = i for all ((v,p),i,(u,j)) € R. Since s is a winning 
strategy, the resulting graph (W, R) is a history-free witness for u C [X;]; by 
construction; in particular, (W, R) is even. For the converse direction, the witness 
for u E [X;]; directly yields a winning Eloise-strategy for the node (u, t) in the 
associated fixpoint game. This implies u C [X;] + by Lemma 4.3. 


5 Solving Equation Systems using Universal Graphs 


We go on to prove our main result. To this end, we fix a system f of fixpoint 
equations f; : L+! > L, 0 < i < k, and put n := |Bz| and d := ad(k) for the 
remainder of the paper. 


Definition 5.1 (Universal graphs [13, 14]). Let G = (W,R) and G’ = 
(W’, R’) be labelled graphs with labels from [d]. A homomorphism of labelled 
graphs from G to G" is a function ®: W — W” such that for all (v, p, w) € R, 
we have (®(v),p,P(w)) € R’. An (n,d + 1)-universal graph S is an even graph 
with labels from [d] such that for all even graphs G with labels from [d] and with 
|G| < n, there is a homomorphism from G to S. 


We fix an (n(d + 1), (d + 1))-universal graph S = (Z, K), noting that there 
are (n(d + 1), (d + 1))-universal graphs (obtained from universal trees) of size 
quasipolynomial in n and d [14]. We now combine the system f with the uni- 
versal graph S to turn the parity conditions associated to general systems of 
fixpoint equations into a safety condition, associated to a single greatest fixpoint 
equation. 
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Definition 5.2 (Chained-product fixpoint). We define a function 
g: P(Br x |k] x Z) > P(Br x [k] x Z) 
U + {(v,p,q) € Br x [k] x Z | v E fy (Py... Po} 


where 


PPI =| Hu € Bz | as € Kaala). (u, i, 8) € U}. 


We refer to Yo =crp g(Yo) as the chained-product fixpoint (equation) of f and S. 


We now show our central result: apart from the annotation with states from the 
universal graph, the chained-product fixpoint g is the solution of the system f. 


Theorem 5.3. For allu € Br and0 < i< k, we have 


uC [Xi]¢ if and only if there is q € Z such that (u, i,q) € [Yol]g- 


Proof. For the forward direction, let u E [X;] s. By Lemma 4.6, there is a history- 
free witness G = (W, R) for u E [X;]j. Since S is a (n(d + 1), d + 1)-universal 
graph and since G is a witness and hence an even labelled graph of suitable 
size |G| < n(d+ 1), there is a graph homomorphism @ from G to S. Start- 
ing at (u,7,®(u,7),0), we inductively construct a witness for containment of 


(u, i, B(u,i)) in [Yo]g. When at (v1, p1, B(v1, p1), 0) with (v1, pı) E€ W, we put 


Ro(v1, pı, Pv, p1), 0) ={(v2, p2, P(v2, p2), 0) € Bz x [d] x Z x [0] | 
(v2, p2) © Raapa) (V1, P1), B(v2, p2) E€ Kaa(p2)(P(v1, pr) } 


and continue the inductive construction with all these (v2, p2, P(v2, p2), 0), hav- 
ing (v2,p2) € W. The resulting structure G” = (W’, R’) indeed is a witness 
for containment of (u, i,q) in [Yo]g: G’ is even by construction. Moreover, we 
need to show that for (v1, pı, (v1, pi),0) E€ W’, we have (v1, pı, P(v1, p1),0) € 
g(m [Ro (v1, pi, 6(v1,p1),0)]), ie. v E fp (PPPOP... PUPP) where 
U = m™[Ro(v1, pı, P(v1, pı), 0)]. Since G is a witness and (v1, p1) E€ W by con- 
struction of W’, we have vı E fp, (Uo, .-.., Up) where U; = |_|(r; [Raa (v1; P1))])- 


By monotonicity of fp, it thus suffices to show that U; E oe D for 


0 < j < k; by definition of P77") this follows if 


m7 (Raa) (v1, p1)] C{u € Bz | ds € Kaacj)(P(v1, p1))-(u, J, 8) € WH, 


where W = 7[Ro(v1, p1, q1, 0)]. So let w € By such that w € 7[Raacj) (v1, P1)]- 
Since R is a witness that is constructed as in the proof of Lemma 4.6, we 
have i = ad(z’) for all ((v’,p’),i,(w’,#’)) € R. Thus (w,j) € Raagijy)(v1, p1) 
for some j such that ad(j) = i, that is, ((v1,p1),ad(7),(w,7)) € R, hence 
(D(v1, pi), ad(j), ®(w,7)) E€ K because ® is a graph homomorphism. By 
definition of Rọ we have (w,j,®(w,j),0) € Ro(v1,p1,0(v1,p1),0) so that 
(w,j,P(w,j)) € m[R6(v1,p1,P(v1,p1),0)]. We are done since P(w,j) € 
Kaag) (Pv, pr). 
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For the converse implication, let (uo, Po, qo) € [Yo], for some qo € Z. Let 
G = (W, R) be a history-free witness for this fact. By Lemma 4.3, it suffices 
to provide a strategy in the fixpoint game for the system f with which Eloise 
wins the node (uo, po). We inductively construct a history-dependent strategy s 
as follows: For i > 0, we abbreviate U; = Rọo(ui, Pi, qi, 0). We put s(uo, po) = 
(PY, slis prut, For the inductive step, let 


Uo, Uo, Une isGn = Un-1,9n— 
Tuo po) (Po Piia P S eos Ea te eee a ities Din | 


3 


be a partial play of the fixpoint game that follows the strat- 
egy that has been constructed so far. Then we have an MR-path 
(uo, Po, qo, 0), (u1, 71,91, 0),---, (Uns Pn, Qn; 0), where, fr 0 < i < N, We 
have (qi, Pi+1;,qi+1) € K since ujy1 E Prim by the inductive construction. 


Put s(r) = (PY™%™,..., PU). Since G is a witness, the strategy uses only 
moves that are available to Eloise (i.e. ones with u, E fp, (s(7))). Also, s is a 
winning strategy as can be seen by looking at the K-paths that are induced by 
complete plays 7 that follow s, as described (for partial plays) above. Since S is 
a universal graph and hence even, every such K-path is even and the sequence 
of priorities in 7 is just the sequence of priorities of one of these K-paths. 


Remark 5.4. Since the set [Yo], is the greatest fixpoint of g, it can be computed 
by simple approximation from above, that is, as g™ (Br, x [k] x Z) where m = 
|Br x [k] x Z|. However, each iteration of the function g may require up to |Z| 
evaluations of an equation. In the next section, we will show how this additional 
iteration factor in the computation of [Yo], can be avoided. 


6 A Progress Measure Algorithm 


We next introduce a lifting algorithm that computes the set [Yo], efficiently, 
following the paradigm of the progress measure approach for parity games 
(e.g. [27,28]). Our progress measures will map pairs (u, i) € Br x [k] to nodes in 
a universal graph that is equipped with a simulation order, that is, a total order 
that is suitable for measuring progress. 


Definition 6.1 (Simulation order). For natural numbers i, i’, we put i > i’ 
if and only if either 7 is even and i = 7’, or both ¿į and 7’ are odd andi >i. A 
total order < on Z is a simulation order if for all q,q' € Z, 


q <q implies that for all 0 < i < k and s € K;(q), there are 
i >i and s’ € Ky (q) such that s < s’. 


Lemma 6.2. There is an (n(d + 1),d + 1)-universal graph (Z,K) of size 
quasipolynomial in n and d, and over which a simulation order < exists. 


Proof (Sketch). It has been shown [14, Theorem 2.2] (originally, in different 
terminology, [28]) that there are (l, h)-universal trees (a concept similar to, but 
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slightly more concrete than universal graphs) with set of leaves T such that 
IT| < I(E ENTIN, Leaves in universal trees are identified by navigation paths, 
that is, sequences of branching directions, so that the leaves are linearly ordered 
by the lexicographic order < on navigation paths (which orders leafs from the 
left to the right). As described in [13], one can obtain a universal graph (T, K) 
over T in which transitions (q, i,q’) € K for odd i (the crucial case) move to 
the left, that is, q’ is a leaf that is to the left of q in the universal tree (so 
that q’ < q), ensuring universality. As it turns out, the lexicographic ordering 
on T is a simulation order. Adapting this construction to our setting, we put 
1 =n(d+1) and h =d+1 and obtain a (n(d+1),d+1)-universal graph (along 
with a simulation order <) of size at most 2n(d + 1) (entat ra) which is 
quasipolynomial in n and d. 


We fix an (n(d+1),d+1)-universal graph (Z, K) and a simulation order < on Z 
for the remainder of the paper (these exist by the above lemma). 


Definition 6.3 (Progress measure, lifting function). We let qmin € Z de- 
note the least node w.r.t. < and fix a distinguished top element « ¢ Z, and 
extend > to Z U {x} by putting x > q for all q E€ Z. A measure is a map 
u: Br x |k] > ZU {x}, i.e. assigns nodes in the universal graph or x to pairs 
(v,p) € Br x [k]. A measure u is a progress measure if whenever (v, p) 4 x, 
then v E f,(Uf",..., URT) where q = (v, p) and 


yma = | Ku E Br | ds € Kaafi) (q). ulu, i) < s}. 


We define a function Lift : (Bz x [k] > ZU {x}) > (Br x [k] > ZU {«}) on 
measures by 


(Lift(u))(v,p) = min{g € Z | v E fp(Up,..., Up") } 


where min(Z’) denotes the least element of Z’ w.r.t. <, for Ø 4 Z’ C Z; also we 
put min(Q) = x. 


The lifting algorithm then starts with the least measure Mmin that maps all pairs 
(v,p) € Br x [k] to the minimal node (i.e. Mmin(v, p) = min) and repeatedly 
updates the current measure using Lift until the measure stabilizes. 


Lifting algorithm 

(1) Initialize: Put u := mynin. 

(2) If Lift(u) A u, then put u := Lift(j) and go to 2. Otherwise go to 3. 
(3) Return the set E = {(v,p) € Br x [k] | u(v,p) # x}. 


Lemma 6.4 (Correctness). For all v € Br and0 < p < k, we have 


(v, p) E E if and only if v € [X>];. 
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Proof (Sketch). Let u denote the progress measure that the algorithm computes. 
For one direction of the proof, let (v, p) € E. By Lemma 4.6 it suffices to construct 
a witness for v € [Xp]. We extract such a witness (E, R) from the progress 
measure u, relying on the properties of the simulation order < that is used 
to measure the progress of u to ensure that any infinite sequence of measures 
that u assigns to some R-path induces an infinite (and hence even) path in the 
employed universal graph. This shows that (E, R) indeed is an even graph and 
hence a witness. For the converse direction, let v € [X,] so that there is, by 
Theorem 5.3, some q € Z such that (v, p,q) € [Yo],- For (u, i) such that there is 
qd € Z such that (u,i,q') € [Yo],, let qu) € Z denote the minimal such node 
w.r.t. <. It now suffices that u(u, i) < qu,i) for all such (u, i), which is shown by 
induction on the number of iterations of the lifting algorithm. 


Corollary 6.5. Solutions of systems of fixpoint equations can be computed with 
quasipolynomially many evaluations of equations. 


Proof. Given an (n(d + 1), d+ 1)-universal graph (Z, K) and a simulation order 
on Z, the lifting algorithm terminates and returns the solution of f after at 
most n(d+1)-|Z| many iterations. This is the case since each iteration (except 
the final iteration) increases the measure for at least one of the n(d + 1) nodes 
and the measure of each node can be increased at most |Z| times. Using the 
universal graph and the simulation order from the proof of Lemma 6.2, we have 


|Z| < 2n(d+ D N so that the algorithm terminates after at most 


2(n(d + 1))? (WBE +44) € O((n(d + 1))!84+D) iterations of the function 
Lift. Each iteration can be implemented to run with at most n(d+ 1) evaluations 


of an equation. 


Corollary 6.6. The number of function calls required for the solution of systems 
of fixpoint equations with d < logn is bounded by a polynomial in n and d. 


Proof. Following the insight of Theorem 2.8 in [9], Theorem 2.2. in [14] implies 
that if d < logn, then there is an (n(d+1), d+1)-universal tree of size polynomial 
in n and d. In the same way as in the proof of Lemma 6.2, one obtains a universal 
graph of polynomial size and a simulation order on it. 


Example 6.7. Applying Corollary 6.5 and Corollary 6.6 to Example 3.2, we 
obtain the following results: 


(1) The model checking problems for the energy ji-calculus and finite latticed 
p-calculi are in QP. For energy parity games with sufficient upper bound b on 
energy level accumulations, we obtain a progress measure algorithm that termi- 
nates after a number of iterations that is quasipolynomial in b. 


(2) Under mild assumptions on the modalities (see [24]), the model checking 
problem for the coalgebraic p-calculus is in QP; in particular, this yields QP 
model checking algorithms for the graded pi-calculus and the two-valued prob- 
abilistic -calculus (equivalently: QP progress measure algorithms for solving 
graded and two-valued probabilistic parity games). 
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(3) Under mild assumptions on the modalities (see [25]), we obtain a novel upper 
bound 20(ndlogn) for the satisfiability problems of coalgebraic p-calculi, in par- 
ticular including the monotone s-calculus, the alternating-time p-calculus, the 
graded p-calculus and the (two-valued) probabilistic y-calculus, even when the 
latter two are extended with (monotone) polynomial inequalities. This improves 
on the best previous bounds in all cases. 


7 Conclusion 


We have shown how to use universal graphs to compute solutions of systems of 
fixpoint equations X; = mi. fi(Xo,...,X%) (with the n; marking least or greatest 
fixpoints) that use functions f; : LET! > L (over a finite lattice L with basis 
Bz) and involve up to k + 1-fold nesting of fixpoints. Our progress measure 
algorithm needs quasipolynomially many evaluations of equations, and runs in 
time O(q-t(f)), where q is a quasipolynomial in |Bz| and the alternation depth 
of the equation system, and where ¢(f) is an upper bound on the time it takes 
to compute f; for all i. 

As a consequence of our results, the upper time bounds for the evaluation 
of various general parity conditions improve. Example domains beyond solv- 
ing parity games to which our algorithm can be instantiated comprise model 
checking for latticed y-calculi and solving latticed parity games [7,30], solving 
energy parity games and model checking for the energy p-calculus [2, 10], and 
model checking and satisfiability checking for the coalgebraic p-calculus [12]. 
The resulting model checking algorithms for latticed pi-calculi and the energy 
p-calculus run in time quasipolynomial in the provided basis of the respective 
lattice. In terms of concrete instances of the coalgebraic y-calculus, we obtain, 
e.g., quasipolynomial-time model checking for the graded [32] and the prob- 
abilistic u-calculus [12,34] as new results (corresponding results for, e.g., the 
alternating-time ji-calculus [1] and the monotone p-calculus [18] follow as well 
but have already been obtained in our previous work [24]), as well as improved 
upper bounds for satisfiability checking in the graded p-calculus, the probabilis- 
tic u-calculus, the monotone ji-calculus, and the alternating-time ji-calculus. We 
foresee further applications, e.g. in the computation of fair bisimulations and fair 
equivalence [26,31] beyond relational systems, e.g. for probabilistic systems. 

As in the case of parity games, a natural open question that remains is 
whether solutions of fixpoint equations can be computed in polynomial time 
(which would of course imply that parity games can be solved in polynomial 
time). A more immediate perspective for further investigation is to generalize 
the recent quasipolynomial variant [38] of Zielonka’s algorithm [43] for solving 
parity games to solving systems of fixpoint equations, with a view to improving 
efficiency in practice. 
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Abstract. We introduce FRAT, a new proof format for unsatisfiable 
SAT problems, and its associated toolchain. Compared to DRAT, the 
FRAT format allows solvers to include more information in proofs to re- 
duce the computational cost of subsequent elaboration to LRAT. The 
format is easy to parse forward and backward, and it is extensible to 
future proof methods. The provision of optional proof steps allows SAT 
solver developers to balance implementation effort against elaboration 
time, with little to no overhead on solver time. We benchmark our FRAT 
toolchain against a comparable DRAT toolchain and confirm >84% me- 
dian reduction in elaboration time and >94% median decrease in peak 
memory usage. 
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1 Introduction 


The Boolean satsifiability problem is the problem of determining, for a given 
Boolean formula consisting of Boolean variables and connectives, whether there 
exists a variable assignment under which the formula evaluates to true. Boolean 
satisfiability (SAT) is interesting in part because there are surprisingly diverse 
types of problems that can be encoded as Boolean formulas and solved efficiently 
by checking their satisfiability. SAT solvers, programs that automatically solve 
SAT problems, have been successfully applied to a wide range of areas, including 
hardware verification [2], planning [14], and combinatorics [12]. 

The performance of SAT solvers has taken great strides in recent years, 
and modern solvers can often solve problems involving millions of variables and 
clauses, which would have been unthinkable a mere 20 years ago [15]. But this 
improvement comes at the cost of significant increase in the code complexity 
of SAT solvers, which makes it difficult to either assume their correctness on 
faith, or certify their program correctness directly. As a result, the ability of 
SAT solvers to produce independently verifiable certificates has become a press- 
ing necessity. Since there is an obvious certificate format (the satisfying boolean 
assignment) for satisfiable problems, the real challenge in proof-producing SAT 
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solving is in devising a compact proof format for unsatisfiable problems, and 
developing a toolchain that efficiently produces and verifies it. 


The current de facto standard proof format for unsatisfiable SAT problems 
is DRAT [10]. The format, as well as its predecessor DRUP, were designed with 
a strong focus on quick adaptation by the community, emphasizing easy proof 
emission, practically zero overhead, and reasonable validation speed [11]. The 
DRAT format has become the only supported proof format in SAT Competition 
and Races since 2014 due to entrants losing interest in alternatives. 

DRAT is a clausal proof format [6], which means that a DRAT proof consists 
of a sequence of instructions for adding and deleting clauses. It is helpful to think 
of a DRAT proof as a program for modifying the ‘active multiset’ of clauses: the 
initial active multiset is the clauses of the input problem, and this multiset grows 
and shrinks over time as the program is executed step by step. The invariant 
throughout program execution is that the active multiset at any point of time is 
at least as satisfiable as the initial active multiset. This invariant holds trivially 
in the beginning and after a deletion; it is also preserved by addition steps by 
either RUP or RAT, which we explain shortly. The last step of a DRAT proof 
is the addition of the empty clause, which ensures the unsatisfiability of the 
final active multiset, and hence that of the initial active multiset, i.e. the input 
problem. 

Every addition step in DRAT is either a reverse unit propagation (RUP) 
step [6] or a resolution asymmetric tautology (RAT) [13] step. A clause C has the 
property AT (asymmetric tautology) with respect to a formula F if F,C Fi L, 
which is to say, there is a proof of the empty clause by unit propagation using F 
and the negated literals in C. A RUP step that adds C to the active multiset F 
is valid if C has property AT with respect to F. A clause lV C has property RAT 
with respect to F if for every clause 1 V D € F, the clause C V D has property 
AT with respect to F. In this case, C is not logically entailed by F, but F and 
F ^C are equisatisfiable, and a RAT step will add C to the active multiset if C 
has property RAT with respect to F. (See [10] for more about the justification 
for this proof system.) 

DRAT has a number of advantages over formats based on more traditional 
proof calculi, such as resolution or analytic tableaux. For SAT solvers, DRAT 
proofs are easier to emit because CNF clauses are the native data structures 
that the solvers store and manipulate internally. Whenever a solver obtains a new 
clause, the clause can be simply streamed out to a proof file without any further 
modification. Also, DRAT proofs are more compact than resolution proofs, as 
the latter can become infeasibly large for some classes of SAT problems [7]. 

There is, however, room for further improvement in the DRAT format due to 
the information loss incurred by DRAT proofs. Consider, for instance, the SAT 
problem and proofs shown in Figure 1. The left column is the input problem 
in the DIMACS format, the center column is its DRAT proof, and the right 
column is the equivalent proof in the LRAT format, which can be thought of 
as an enriched version of DRAT with more information. The numbers before 
the first zero on lines without a “d” represent literals: positive numbers denote 
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positive literals, while negative numbers denote negative literals. The first clause 
of the input formula is (xı V x2 V Z3), or equivalently 1 2 -3 0 in DIMACS. 


The first lines of both DRAT and LRAT proofs are RUP steps for adding the 
clause (xı V #2), written 1 2 0. When an LRAT checker verifies this step, it is 
informed of the IDs of active clauses (the trailing numbers 1 6 3) relevant for 
unit propagation, in the exact order they should be used. Therefore, the LRAT 
checker only has to visit the first, sixth, and third clauses and confirm that, 
starting with unit literals %1, %2, they yield the new unit literals 73,274, L. In 
contrast, a DRAT checker verifying the same step must add the literals 77, T2 
to the active multiset (in this case, the eight initial clauses) and carry out a 
blind unit propagation with the whole resulting multiset until contradiction. This 
omission of RUP information in DRAT proofs introduces significant overheads 
in proof verification. Although the exact figures vary from problem to problem, 
checking a DRAT proof typically takes approximately twice as long as solving the 
original problem, whereas the verification time for an LRAT proof is negligible 
compared to its solution time. This additional cost of checking DRAT proofs also 
represents a lost opportunity: when a SAT solver emits a RUP step, it knows 
exactly how the new clause was obtained, and this knowledge can (in theory) 
be turned into an LRAT-style RUP annotation, which can cut down verification 
costs significantly if conveyed to the verifier. 


For the DRAT format, a design choice was made not to include such informa- 
tion since demanding explicit proofs for all steps turned out to be impractical. 
Although it is theoretically possible to always glean the correct RUP annotation 
from the solver state, computing this information can be intricate and costly 
for some types of inferences (e.g. conflict-clause minimization [22]), making it 
harder to support proof logging [25]. Reducing such overheads is particularly 
important for solving satisfiable formulas, as proofs are superfluous for them 
and the penalty for maintaining such proofs should be minimized. We should 
note, however, that proof elaboration need not be an all-or-nothing business; if 
it is infeasible to demand 100% elaborated proofs, we can still ask solvers to fill 
in as many gaps as it is convenient for them to do so, which would still be a 
considerable improvement over handling all of it from the verifier side. 


Inclusion of final clauses is another potential area for improvement over the 
DRAT format. A DRAT proof typically includes many addition steps that do 
not ultimately contribute to the derivation of the empty clause. This is unavoid- 
able in the proof emission phase, since a SAT solver cannot know in advance 
whether a given clause will be ultimately useful, and must stream out the clause 
before it can find out. All such steps, however, should be dropped in the post- 
processing phase in order to compress proofs and speed up verification. The 
most straightforward way of doing this is processing the proof in reverse order 
[6]: when processing a clause Ck+1, identify all the clauses used to derive Ck+1, 
mark them as ‘used’, and move on to clause Ck. For each clause, process it if it 
is marked as used, and skip it otherwise. The only caveat of this method is that 
the postprocessor needs to know which clauses were present at the very end of 
the proof, since there is no way to identify which clauses were used to derive the 
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DIMACS DRAT LRAT 
p cnf 48 120 9120 1630 
T 2. =3: 0 d 13 20 9 dio 
=f =2. 3:0 130 10130 9860 
2 3 =4 0 d 1430 10 d60 
=2 =3 4.0 10 12 10 109480 
-1 -3 -4 0 d 130 11 d 10 
1 3 40 d 120 9 
-1 2 40 d i -=4=2°°0 8 0 
1 =2)=4.0 20 12 20 117530 
da =1420 12 d7 
d 2 -430 30 
(0) 13 011122450 


Fig. 1. DRAT and LRAT proofs of a SAT problem. All whitespace and alignment is 
not significant; we have aligned lines of the DRAT proof with the corresponding LRAT 
lines (d steps in LRAT may correspond to multiple DRAT d steps). 


empty clause otherwise. Although it is possible to enumerate the final clauses 
by a preliminary forward pass through a DRAT proof, this is clearly unnecessary 
work since SAT solvers know exactly which clauses are present at the end, and 
it is desirable to put this information in the proof in the first place. 


2 The FRAT format 


To address the above issues, we introduce FRAT, a new proof format designed 
to allow fine-grained communication between SAT solvers and elaborators. The 
main differences between FRAT and DRAT are: 


(1) optional annotation of RUP steps, 
(2) inclusion of final clauses, and 
(3) identification of clauses by unique IDs. 


We've already explained the rationale for (1) and (2); (3) is necessary for concise 
references to clauses in deletions and RUP step annotations. More specifically, 
a FRAT proof consists of the following six types of proof steps: 


o: An original step; a clause from the input file. The purpose of these lines is 
to name the clauses from the input with identifiers; they are not required 
to come in the same order as the file, they are not required to be numbered 
in order, and not all steps in the input need appear here. Proof may also 
progress (with a and d steps) before all o steps are added. 

, 1: An addition step, and an optional LRAT-style unit propagation proof 
of the step. The proof, if provided, is a sequence of clauses in the current 
formula in the order that they become unit. For solver flexibility, they are 
allowed to come out of order, but the elaborator is optimized for the case 
where they are correctly ordered. For a RAT step, the negative numbers in 
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the proof refer to the clauses in the active set that contain the negated pivot 
literal, followed by the unit propagation proof of the resolvent. See [3] for 
more details on the LRAT checking algorithm. 

d: A deletion step for deleting the clause with the given ID from the formula. 
The literals given must match the literals in the corresponding addition step 
up to permutation. 

r: A relocation step. The syntax is r (ids) 0, where (ids) has the form so, to, 

.., Sk, tk and must consist of an even number of clause IDs. It indicates 
that the active clause with ID s; is re-labeled and now has ID ¢;, for each 
0< i< k. (This is used for solvers that use pointer identity for clauses, but 
also do garbage collection to decrease memory fragmentation.) 

f: A finalization step. These steps come at the end of a proof, and provide the 
list of all active clauses at the end of the proof. The clauses may come in any 
order, but every step that has been added and not deleted must be present. 
(For best results, clauses should be finalized in roughly reverse order of when 
they were added.) 


(Our modified version of CADICAL also outputs a seventh kind of step, 
t (todo_id) 0, to collect statistics on code paths that produce a steps without 
proofs. See Section 3 for how this information is used.) 

Figure 1 is an example from [3], which includes a SAT problem in DIMACS 
format, and the proofs of its unsatisfiability in DRAT and LRAT formats. It 
shows how proofs are produced and elaborated via the DRAT toolchain. Figure 
2 shows the corresponding problem and proofs for the FRAT toolchain. Notice 
how the FRAT proof is more verbose than its DRAT counterpart and includes all 
the hints for addition steps, which are reused in the subsequent LRAT proof. 


Binary FRAT The files shown in Figure 2 are in the text version of the FRAT 
format, but for efficiency reasons solvers may also wish to use a binary encoding. 
The binary FRAT format is exactly the same in structure, but the integers are 
encoded using the same variable-length integer encoding used in binary DRAT [9]. 
Unsigned numbers are encoded in 7-bit little endian, with the high bit set on 
each byte except the last. That is, the number 


n = ro + 2% ay +--+ +2" a, 
(with each zx; < 2") is encoded as 
Ito 1a, ... OX. 
Signed numbers are encoded by mapping n > 0 to f(n) := 2n and —n (with 
n > 0) to f(n) := 2n +1, and then using the unsigned encoding. (Incidentally, 


the mapping f is not surjective, as it misses 1. But it is used by other formats 
so we have decided not to change it.) 
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FRAT 
o 1 12 =3 0 E T 1 2-3 0 
o 2 =t =2' 3: 0 f2 =2.=1 390 
o 3 2 3 -40 £3 23-40 
o4 -2 -3 40 f 4 -2-340 
o5 ={ 59 =4. 0 £5 Sh 23 =4 0 
o 6 1340 f 6 1340 
o 7 =1 24 0 f7 -1240 
o 8 1 -2 -4 0 fs 1240 
a 9 =3 =4.0 1 5180 f 9 -3 -4 0 
a 10 -4 0 1 93280 f 10 -4 0 
a 11 3 0 £ at 3 0 
a 12 =2.0 f 12 =2 0 
a 13 101 12 I1 t0 f 13 10 
a 14 0113121070 f 14 (0) 
LRAT 

9 -3 -4 0 5180 

9 ad50 

10 -4 0 93280 

10 ads8390 

11 3 0 106720 

11 a260 

12 =2 0 11 1040 

12 da4o0 

13 1 0 i2 Irro 

13 diiio 

14 013 12 1070 


Fig. 2. FRAT and LRAT proofs of a SAT problem. To illustrate that proofs are optional, 
we have omitted the proofs of steps 11 and 12 in this example. The steps must still be 
legal RAT steps but the elaborator will derive the proof rather than the solver. 


2.1 Flexibility and extensibility 


The purpose of the FRAT format is for solvers to be able to quickly write down 
what they are doing while they are doing it, with the elaborator stage “picking 
up the pieces” and preparing the proof for consumption by simpler mechanisms 
such as certified LRAT checkers. As such, it is important that we are able to 
concisely represent all manner of proof methods used by modern SAT solvers. 

The high level syntax of a FRAT file is quite simple: A sequence of “segments” , 
each of which begins with a character, followed by zero or more nonzero numbers, 
followed by a 0. In the binary version, each segment similarly begins with a 
printable character, followed by zero or more nonzero bytes, followed by a zero 
byte. (Note that continuation bytes in an unsigned number encoding are always 
nonzero.) This means that it is possible to jump into a FRAT file and find segment 
boundaries by searching for a nearby zero byte. 
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(proof) + (line)* 
(line) < (orig) | (add) | (del) | (final) | (reloc) 
(add) + (add_seg) | (add_seg) (hint) 
(orig) + o (id) (literal)* 0 
(add_seg) + a (id) (literal)* 0 
(del) + d (id) (literal)* 0 
(final) + £ (id) (literal)* 0 
(reloc) + r ((id) (id))* 0 
(hint) + 1 (lid) | —(id))* 0 
(id) + (pos) 
(literal) + (pos) | (neg) 
(neg) + —(pos) 
(pos) + [1-9] [0-9]" 


Fig. 3. Context-free grammar for the FRAT format. 


text| a 9 -3 -4 0 1 5 1 8 0O 
binary |61 09 07 09 00 6C OA 02 10 00 


Fig. 4. Comparison of binary and text formats for a step. Note that the step ID 9 uses 
the unsigned encoding, but literals and LRAT style proof steps use signed encoding. 


This is in contrast to binary LRAT, in which add steps are encoded as 
a (id) (literal)*0 (4(id))* 0, because a random zero byte could either be the 
end of a segment or the middle of an add step. Since 0x61, the ASCII repre- 
sentation of a, is also a valid step ID (encoding the signed number —48), in a 
sequence such as (a (nonzero)* 0)*, the literals and the steps cannot be locally 
disambiguated. 


The local disambiguation property is important for our FRAT elaborator, 
because it means that we can efficiently parse FRAT files generated by solvers 
backward, reading the segments in reverse order so that we can perform backward 
checking in a single pass. 


DRAT is based on adding clauses that are RAT with respect to the active 
formula. It is quite versatile and sufficient for most common cases, covering 
CDCL steps, hyper-resolution, unit propagation, blocked clause elimination and 
many other techniques. However, we recognize that not all methods can be cast 
into this format, or are too expensive to translate into this proof system. In 
this work we define only six segment characters (a, d, f, 1, o, r), that suffice 
to cover methods used by SAT solvers targeting DRAT. However, the format is 
forward-compatible with new kinds of proof steps, that can be indicated with 
different characters. 
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For example, CRYPTOMINISAT [21] is a SAT solver that also supports XOR 
clause extraction and reasoning, and can derive new XOR clauses using proof 
techniques such as Gaussian elimination. Encoding this in DRAT is quite compli- 
cated: The XOR clauses must be Tseitin transformed into CNF, and Gaussian 
elimination requires a long resolution proof. Participants in SAT competitions 
therefore turn this reasoning method off as producing the DRAT proofs is either 
too difficult or the performance gains are canceled out by the overhead. 

FRAT resolves this impasse by allowing the solver to express itself with min- 
imal encoding overhead. A hypothetical extension to FRAT would add new seg- 
ment characters to allow adding and deleting XOR clauses, and a new proof 
method for proof by linear algebra on these clauses. The FRAT elaborator would 
be extended to support the new step kinds, and it could either perform the 
expensive translation into DRAT at that stage (only doing the work when it is 
known to be needed for the final proof), or it could pass the new methods on 
to some XLRAT backend format that understands these steps natively. Since the 
extension is backward compatible, it can be done without impacting any other 
FRAT-producing solvers. 


3 FRAT-producing solvers 


The FRAT proof format is designed to allow conversion of DRAT-producing 
solvers into FRAT-producing solvers at minimal cost, both in terms of implemen- 
tation effort and impact on runtime efficiency. In order to show the feasibility of 
such conversions, we chose two popular SAT solvers, CADICAL! and MINISAT?, 
to modify as case studies. The solvers were chosen to demonstrate two different 
aspects of feasibility: since MINISAT forms the basis of the majority of modern 
SAT solvers, an implementation using MINISAT shows that the format is widely 
applicable, and provides code which developers can easily incorporate into a 
large number of existing solvers. CADICAL, on the other hand, is a cutting- 
edge modern solver which employs a wide range of sophisticated optimizations. 
A successful conversion of CADICAL shows that the technology is scalable, and 
is not limited to simpler toy examples. 

As mentioned in Section 2, the main solver modifications required for FRAT 
production are inclusions of clause IDs, finalization steps, and LRAT proof traces. 
The provision of IDs requires some non-trivial modification as many solvers, in- 
cluding CADICAL and MINISAT, do not natively keep track of clause IDs, and 
DRAT proofs use literal lists up to permutation for clause identity. In CADICAL, 
we added IDs to all clauses, leading to 8 bytes overhead per clause. Additionally, 
unit clauses are tracked separately, and ensuring proper ID tracking for unit 
clauses resulted in some added code complexity. In MINISAT, we achieved 0 byte 
overhead by using the pointer value of clauses as their ID, with unit clauses hav- 
ing computed IDs based on the literal. This requires the use of relocation steps 
during garbage collection. The output of finalization steps requires identifying 


' https: //github.com/digama0/cadical 
? https://github.com/digama0/minisat 
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the active set from the solver state, which can be subtle depending on the solver 
architecture, but is otherwise a trivial task assuming knowledge of the solver. 

LRAT trace production is the heart of the work, and requires the solver to 
justify each addition step. This modification is relatively easier to apply to MINI- 
SAT, as it only adds clauses in a few places, and already tracks the “reasons” for 
each literal in the current assignment, which makes the proof trace straightfor- 
ward. In contrast, CADICAL has over 30 ways to add clauses; in addition to the 
main CDCL loop, there are various in-processing and optimization passes that 
can create new clauses. 

To accommodate this complexity, we leverage the flexibility of the FRAT 
format which allows optional hints to focus on the most common clause addi- 
tion steps, to reap the majority of runtime advantage with only a few changes. 
The FRAT elaborator falls back on the standard elaboration-by-unit propagation 
when proofs are not provided, so future work can add more proofs to CADICAL 
without any changes to the toolchain. 

To maximize the efficacy of the modification, we used a simple method to find 
places to add proofs. In the first pass, we added support for clause ID tracking 
and finalization, and changing the output format to FRAT syntax. Since CADI- 
CAL was already producing DRAT proofs, we can easily identify the addition 
and removal steps and replace them with a and d steps. Once this is done, CA- 
DICAL is producing valid FRAT files which can pass through the elaborator and 
get LRAT results, but it will be quite slow since the FRAT elaborator is essentially 
acting as a less-optimized version of DRAT-trim at this point. 

We then find all code paths that lead to an a step being emitted, and add 
an extra call to output a step of the form t (todo_id) 0, where (todo_id) is some 
unique identifier of this position in the code. The FRAT elaborator is configured 
to ignore these steps, so they have no effect, but by running the solver on bench- 
marks we can count how many t steps of each kind appear, and so see which 
code paths are hottest. 

The basic idea is that elaborating a step that has a proof is much faster than 
elaborating a step that doesn’t, but the distribution of code paths leading to 
add steps is highly skewed, so adding proofs to to the top 3 or 4 paths already 
decreases the elaboration time by over 70%. At the time of writing, about one 
third of CADICAL code paths are covered, and median elaboration time is 
about 15% that of DRAT-trim (see Section 5). (This is despite the fact that our 
elaborator could stand to improve on low level optimizations, and runs about 
twice as slow as DRAT-trim when no proofs are provided.) 


4 Elaboration 


The main tasks of the FRAT-to-LRAT elaborator® are provision of missing RUP 
step hints, elimination of irrelevant clause additions, and re-labeling clauses with 
new IDs. These tasks are performed in two separate ‘passes’ over files, writing 


3 The elaborator used for this paper can be found at https://github.com/digama0/ 
frat /tree/tacas. 
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Algorithm 1 First pass (elaboration): FRAT to elaborated reversed FRAT 


1: function ELABORATE(cert) 
2: F + ý, revcert + || > F is a map ID > clause with a bool marking 


3: for step in reverse(cert) do 
4: case step of 
5: o(i, C) => 
6: C’ + F.remove(i); assert C’ ~ C 
T: if C’.marked then revcert + revcert, o(i, C) 
8: a(i, C, proof’) > 
9: C” + F.remove(i); assert C’ ~ C 
10: if C’. marked then 
11: steps’ < case proof’ of 
12: € => PROVERAT(F,C) 
13: 1(steps) = CHECKHINT(F, C, steps) 
14: for j in {j | +j € steps’} do 
15: if —F;.marked then 
16: F;.marked + true 
17: revcert + revcert, a(step, F;) 
18: reucert + revcert, a(i, C, 1(steps’)) 
19: d(i,C) => F.insert(i, C, marked: false) 
20: f(i, C) > Fiinsert(i, C, marked: C = L) 
21: r(R) > 
22: R' + {(s,t) € R| 3x.(t,£) € F} 
23: F + F — {(t, F:) | (s,t) € R'} + {(s, Fe) | (s,t) € R'} 
24: revcert + revcert,r(R') 
25: return revcert 


and reading directly to disk (so the entire proof is never in memory at once). In 
the first pass, the elaborator reads the FRAT file and produces a temporary file 
(which may be stored on disk or in memory depending on configuration). The 
temporary file is essentially the original FRAT file with the steps put in reverse 
order, while satisfying the following additional conditions: 


— All a steps have annotations. 

— Every clause introduced by an o, a, or r step ultimately contributes to the 
proof of L. Note that we consider an r step as using an old clause with the 
old ID and introducing a new clause with the new ID. 

— There are no f steps. 


Algorithm 1 shows the pseudocode of the first pass, ELABORATE(cert). Here, 
cert is the FRAT proof obtained from the SAT solver, and the pass works by 
iterating over its steps in reverse order, producing the temporary file revcert. 
The map F maintains the active formula as a map with unique IDs for each 
clause (double inserts and removes to F are always error conditions), and the 
effect of each step is replayed backwards to reconstruct the solver’s state at the 
point each step was produced. 
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Algorithm 2 Second pass (renumbering): elaborated reversed FRAT to LRAT 

1: function RENUMBER(Forig, reucert) 

2: M +, k + |Forigl, lrat + |] > M is a map ID > ID 

3 for step in reverse(revcert) do 

4 case step of 

5: o(i, C) = find j such that C œ (Forig)j; M.insert(i, j) 
6: a(i, C,1(steps)) > 
T 
8 


k+ k+ 1; M.insert(i, k) 
: lrat + lrat, add(k, C, [+M; | +i € steps]) 
9: if C = L then return [rat 


10: d(i,C) => lrat + lrat, de1(k, M.remove(i)) 
1s r(R) > M + M — {(s, Ms) | (s,t) E R} + {(t, Ms) | (s,t) € R} 
12: assert false > no proof of L found 


— All d or f clauses are immediately inserted to F, but (with the exception of 
the empty clause) are marked as not necessarily required for the proof, and 
the d step is deferred until just before its first use (or rather, just after the 
last use). 

— PROVERAT(F,C), not given here, checks that C has property RAT with re- 
spect to F, and produces a step list in LRAT format (where positive numbers 
are clause references in a unit propagation proof, and negative numbers are 
used in RAT steps, indicating the clauses to resolve against). 

— CHECKHINT(F, C, steps) does the same thing, but it has been given a candi- 
date proof, steps. It will check that steps is a valid proof, and if so, returns 
it, but the steps in the unit propagation proof may be out of order (in which 
case they are reordered to LRAT conformity), and if the given proof is not 
salvageable, it falls back on PROVERAT (F, C) to construct the proof. 


In the second pass, RENUMBER(Forig, revcert) reads the input DIMACS file 
and the temporary file from the first pass, and produces the final result in LRAT 
format. Not much checking happens in this pass, but we ensure that the o steps 
in the FRAT file actually appear (up to permutation) in the input. The state that 
is maintained in this pass is a list of all active clause IDs, and the corresponding 
list of LRAT IDs (in which original steps are always numbered sequentially in 
the file, and add/delete steps use a monotonic counter that is incremented on 
each addition step). 

The resulting LRAT file can then be verified by any of the verified LRAT 
checkers [26] (and our toolchain also includes a built-in LRAT checker for verifi- 
cation). 

The 2-pass algorithm is used in order to optimize memory usage. The result 
of the first pass is streamed out so that the intermediate elaboration result does 
not have to be stored in memory simultaneously. Once the temporary file is 
streamed out, we need at least one more pass to reverse it (even if the labels did 
not need renumbering) since its steps are in reverse order. 
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We performed benchmarks comparing our FRAT toolchain (modified CADICAL 
+ FRAT-to-LRAT elaborator written in Rust) against the DRAT toolchain (stan- 
dard CADICAL + DRAT-trim) and measured their execution times, output file 
sizes, and peak memory usages while solving SAT instances in the DIMACS 
format and producing their LRAT proofs. All tests were performed on Amazon 
EC2 r5a.xlarge instances, running Ubuntu Server 20.04 LTS on 2.5 GHz AMD 
EPYC 7000 processors with 32 GB RAM and 512 GB SSD. 

The instances used in the benchmark were chosen by selecting all 97 instances 
for which default-mode CADICAL returned ‘UNSAT” in the 2019 SAT Race 
results. One of these instances was excluded because DRAT-trim exhausted the 
available 32GB memory and failed during elaboration. Although this instance 
was not used for comparisons below, we note that it offers further evidence of the 
FRAT toolchain’s efficient use of memory, since the FRAT-to-LRAT elaboration 
of this instance succeeded on the same system. The remaining 96 instances were 
used for performance comparison of the two toolchains. + 

Figures 5 and 6 show the time and memory measurements from the bench- 
mark. We can see from Figure 5 that the FRAT toolchain is significantly faster 
than DRAT toolchain. Although the modified CADICAL tends to be slightly 
(6%) slower than standard CADICAL, that overhead is more than compensated 
by a median 84% decrease in elaboration time (the sum over all instances are 
1700.47 s in the DRAT toolchain vs. 381.70 s in the FRAT toolchain, so the 
average is down by 77%). If we include the time of the respective solvers, the 
FRAT + modified CADICAL toolchain takes 53.6% of the DRAT + CaDICaL 
toolchain on median. The difference in the toolchains’ time budgets is clear: the 
DRAT toolchain spends 42% of its time in solving and 58% in elaboration, while 
FRAT spends 85% on solving and only 15% on elaboration. 

Figure 6 shows a dramatic difference in peak memory usage between the 
FRAT and DRAT toolchains. On median, the FRAT toolchain used only 5.4% as 
much peak memory as DRAT. (The average is 318.62 MB, which is 11.98% that 
of the DRAT toolchain’s 2659.07 MB, but this is dominated by the really large 
instances. The maximum memory usage was 2.99 GB for FRAT and 21.5 GB 
for DRAT, but one instance exhausted the available 32 GB in DRAT and is not 
included in this figure.) This result is in agreement with our initial expectations: 
the FRAT toolchain’s 2-pass elaboration method allows it to limit the number of 
clauses held in memory to the size of the active set used by the solver, whereas 
the DRAT toolchain loads all clauses in a DRAT file into memory at once during 
elaboration. This difference suggests that the FRAT toolchain can be used to 
verify instances that would otherwise require more memory than the system 
limit on the DRAT toolchain. 

There were no noticeable differences in the sizes or verification times of LRAT 
proofs produced by the two toolchains. On average, LRAT proofs produced by 


* A CSV of detailed benchmark results can be found at https://github.com/digama0/ 
frat /blob/tacas/benchmark/benchmark-results.csv. 
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Fig. 5. FRAT vs. DRAT time comparison. The datapoints of ‘FRAT total’ and ‘DRAT 
total’ show the number of instances that each toolchain could generate LRAT proofs 
for within the given time limit. The datapoints of ‘FRAT elab’ and ‘DRAT elab’ show 
the number of instances whose intermediate format proof files (FRAT or DRAT) could 
be elaborated to LRAT within the given time limit. 


the FRAT toolchain were 1.873% smaller and 3.314% faster® to check than those 
from the DRAT toolchain. 


One minor downside of the FRAT toolchain is that it requires the storage of a 
temporary file during elaboration, but we do not expect this to be a problem in 
practice since the temporary file is typically much smaller than either the FRAT 
or LRAT file. In our test cases, the average temporary file size was 28.68% and 
47.60% that of FRAT and LRAT files, respectively. In addition, users can run the 
elaborator with the -m option to bypass temporary files and write the temporary 
data to memory instead, which further improves performance but foregoes the 
memory conservation that comes with 2-pass elaboration. 

The CADICAL modification is only a prototype, and some of its weaknesses 
show in the data. The general pattern we observed is that on problems for which 
the predicted CADICAL code paths were taken, the generated files have a large 
number of hints and the elaboration time is negligible (the “FRAT elab” line in 
fig. 5); but on problems which make use of the more unusual in-processing op- 
erations, many steps with no hints are given to the elaborator, and performance 
becomes comparable to DRAT-trim. For solver developers, this means that there 


5 One instance was omitted from the LRAT verification time comparison due to what 
seems to be a bug in the standard LRAT checker included in DRAT-trim. Detailed 
information regarding this instance can be found at https://github.com/digama0/ 
frat /blob/tacas/benchmark/README.md. 
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Fig. 6. FRAT vs. DRAT peak memory usage comparison. Each datapoint shows the 
number of instances that each toolchain could successfully generate LRAT proofs for 
within the given peak memory usage limit. 


is a very direct relationship between proof annotation effort and mean solution 
+ elaboration time. Currently, elaboration of FRAT files with no annotations 
(the worst-case scenario for the FRAT toolchain) typically takes slightly more 
than twice as long as elaboration of DRAT files with DRAT-trim, likely due to 
missing optimizations from DRAT-trim that could be incorporated, but this only 
underscores the effectiveness of adding hints to the format. 


6 Related works 


As already mentioned, the FRAT format is most closely related to the DRAT 
format [8], which it seeks to replace as an intermediate output format for SAT 
solvers. It is also dependent on the LRAT format and related tools [3], as the 
FRAT toolchain targets LRAT as the final output format. 

The GRAT format [16] and toolchain also aims to improve elaboration of 
SAT unsatisfiability proofs, but takes a different approach from that of FRAT. It 
retains DRAT as the intermediate format, but uses parallel processing and targets 
a new final format with more information than LRAT in order to improve overall 
performance. GRAT also comes with its own verified checker [17]. 

Specifying and verifying the program correctness of SAT solvers (sometimes 
called the autarkic method, as opposed to the proof-producing skeptical method) 
is a radically different approach to ensuring the correctness of SAT solvers. There 
have been various efforts to verify nontrivial SAT solvers [18,20,19,4,5]. Although 
these solvers have become significantly faster, they cannot compete with the 
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(unverified) state-of-the-art solvers. It is also difficult to maintain and modify 
certified solvers. Proving the correctness of nontrivial SAT solvers can provide 
new insights about key invariants underlying the used techniques [5]. 

Generally speaking, devising proof formats for automated reasoning tools 
and augmenting the tools with proof output capability is an active research area. 
Notable examples outside SAT solving include the LFSC format for SMT solving 
[23] and the TSTP format for classical first-order ATPs [24]. In particular, the 
recent work on the VERIT SMT solver [1] is motivated by similar rationales as 
that for the FRAT toolchain; the key insight is that a proof production pipeline 
is often easier to optimize on the solver side than on the elaborator side, as the 
former has direct access to many types of useful information. 


7 Conclusion 


The test results show that the FRAT format and toolchain made significant per- 
formance gains relative to their DRAT equivalents in both elaboration time and 
memory usage. We take this as confirmation of our initial conjectures that (1) 
there is a large amount of useful and easily extracted information in SAT solvers 
that is left untapped by DRAT proofs, and (2) the use of streaming verification 
is the key to verifying very large proofs that cannot be held in memory at once. 

The practical ramification is that, provided that solvers produce well-anno- 
tated FRAT proofs, the elaborator is no longer a bottleneck in the pipeline. 
Typically, when DRAT-trim hangs it does so either by taking excessive time, or 
by attempting to read in an entire proof file at once and exhausting memory 
(the so-called “uncheckable” proofs that can be produced but not verified). But 
FRAT-to-LRAT elaboration is typically faster than FRAT production, and the 
memory consumption of the FRAT-to-LRAT elaborator at any given point is 
proportional to the memory used by the solver at the same point in the proof. 
Since LRAT verification is already efficient, the only remaining limiting factor is 
essentially the time and memory usage of the solver itself. 

In addition to performance, the other main consideration in the design of the 
FRAT format and toolchain was flexibility of use and extension. The encoding 
of FRAT files allows them to be read and parsed both backward and forward, 
and the format can be modified to include more advanced inferences, as we 
have discussed in the example of XOR steps. The optional 1 steps allow SAT 
solvers to decide precisely when they will provide explicit proofs, thereby pro- 
moting a workable compromise between implementation complexity and runtime 
efficiency. SAT solver developers can begin using the format by producing the 
most bare-bones FRAT proofs with no annotations (essentially DRAT proofs with 
metadata for original/final clauses) and gradually work toward providing more 
complete hints. We hope that this combination of efficiency and flexibility will 
motivate performance-minded SAT solver developers to adopt the format and 
support more robust proof production, which is presently only an afterthought 
in most SAT solvers. 
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Abstract. In 2006, Biere, Jussila, and Sinz made the key observation that the 
underlying logic behind algorithms for constructing Reduced, Ordered Binary 
Decision Diagrams (BDDs) can be encoded as steps in a proof in the extended 
resolution logical framework. Through this, a BDD-based Boolean satisfiability 
(SAT) solver can generate a checkable proof of unsatisfiability. Such proofs indi- 
cate that the formula is truly unsatisfiable without requiring the user to trust the 
BDD package or the SAT solver built on top of it. 

We extend their work to enable arbitrary existential quantification of the for- 
mula variables, a critical capability for BDD-based SAT solvers. We demonstrate 
the utility of this approach by applying a prototype solver to obtain polynomi- 
ally sized proofs on benchmarks for the mutilated chessboard and pigeonhole 
problems—ones that are very challenging for search-based SAT solvers. 


Keywords: extended resolution, binary decision diagrams, mutilated chessboard, 
pigeonhole problem 


1 Introduction 


When a Boolean satisfiability (SAT) solver returns a purported solution to a Boolean 
formula, its validity can easily be checked by making sure that the solution indeed satis- 
fies the formula. When the formula is unsatisfiable, on the other hand, having the solver 
simply declare this to be the case requires the user to have faith in the solver, a complex 
piece of software that could well be flawed. Indeed, modern solvers employ a number 
of sophisticated techniques to reduce the search space. If one of those techniques is 
invalid or incorrectly implemented, the solver may overlook actual solutions and label 
a formula as unsatisfiable, even when it is not. 

With SAT solvers providing the foundation for a number of different real-world 
tasks, this “false negative” outcome could have unacceptable consequences. For exam- 
ple, when used as part of a formal verification system, the usual strategy is to encode 
some undesired property of the system as a formula. The SAT solver is then used to 
determine whether some operation of the system could lead to this undesirable prop- 
erty. Having the solver declare the formula to be unsatisfiable is an indication that the 
undesirable behavior cannot occur, but only if the formula is truly unsatisfiable. 
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Rather than requiring users to place their trust in a complex software system, a 
proof-generating solver constructs a proof that the formula is indeed unsatisfiable. The 
proof has a form that can readily be checked by a simple proof checker. Initial work of 
checking unsatisfiability results was based on resolution proofs, but modern checkers 
are based on stronger proof systems [16,33]. The checker provides an independent val- 
idation that the formula is indeed unsatisfiable. The checker can even be simple enough 
to be formally verified [9,23,29]. Such a capability has become an essential feature for 
modern SAT solvers. 

In their 2006 papers [21,28], Jussila, Sinz and Biere made the key observation that 
the underlying logic behind algorithms for constructing Reduced, Ordered Binary Deci- 
sion Diagrams (BDDs) [4] can be encoded as steps in a proof in the extended resolution 
logical framework [30]. Through this, a BDD-based Boolean satisfiability solver can 
generate checkable proofs of unsatisfiability for a set of clauses. Such proofs indicate 
that the formula is truly unsatisfiable without requiring the user to trust the BDD pack- 
age or the SAT solver built on top of it. 

In this paper, we refine these ideas to enable a full-featured, BDD-based SAT solver. 
Chief among these is the ability to perform existential quantification on arbitrary vari- 
ables. (Jussila, Sinz, and Biere [21] extended their original work [28] to allow exis- 
tential quantification, but only for the root variable of a BDD.) In addition, we allow 
greater flexibility in the choice of variable ordering and the order in which conjunction 
and quantification operations are performed. This combination allows a wide range of 
strategies for creating a sequence of BDD operations that, starting with a set of input 
clauses, yield the BDD representation of the constant function 0, indicating that the for- 
mula is unsatisfiable. Using the extended-resolution proof framework, these operations 
can generate a proof showing that the original set of clauses logically implies the empty 
clause, providing a checkable proof that the formula is unsatisfiable. 

As the experimental results demonstrate, our refinements enable a proof-generating 
BDD-based SAT solver to achieve polynomial performance on several classic “hard” 
problems [1,15]. Since the performance of a proof-generating SAT solver affects not 
only the runtime of the program, but also the length of the proofs generated, achieving 
polynomial performance is an important step forward. Our results for these benchmarks 
rely on a novel approach to ordering the conjunction and quantification operations, 
inspired by symbolic model checking [7]. 

This paper is structured as follows. First, it provides a brief introduction to the res- 
olution and extended resolution logical frameworks and to BDDs. Then we show how 
a BDD-based SAT solver can generate proofs by augmenting algorithms for comput- 
ing the conjunction of two functions represented as BDDs, and for checking that one 
function logically implies another. We then describe our prototype implementation and 
evaluate its performance on several classic problems. We conclude with some general 
observations and suggestions for further work. 


2 Preliminaries 


Given a Boolean formula over a set of variables {21, £2, . . . , &n }, a SAT solver attempts 
to find an assignment to these variables that will satisfy the formula, or it declares 
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that the formula is unsatisfiable. As is standard practice, we use the term literal to 
refer to either a variable or its complement. Most SAT solvers use Boolean formulas 
expressed in conjunctive normal form, where the formula consists of a set of clauses, 
each consisting of a set of literals. Each clause is a disjunction: if an assignment sets 
any of its literals to true, the clause is considered to be satisfied. The overall formula is 
a conjunction: a satisfying assignment must satisfy all of the clauses. 

We write T to denote both tautology and logical truth, and | to represent both an 
empty clause and logical falsehood. When writing clauses, we omit disjunction symbols 
and use overlines to denote negation, writing UV v V Was UUW. 


2.1 (Extended) Resolution Proofs 


Robinson [26] observed that a single inference rule could form the basis for a refutation 
theorem-proving technique for first-order logic. Here, we consider its specialization to 
propositional logic. For clauses of the form C V x, and TV D, the resolution rule derives 
the new clause C V D. This inference is written with a notation showing the required 
conditions above a horizontal line, and the resulting inference (the resolvent) below: 


CV a TVD 
CVD 


Resolution provides a mechanism for proving that a set of clauses is unsatisfiable. Sup- 
pose the input consists of m clauses. A resolution proof is given as a trace consisting of 
a series of steps S, where each step s; consists of a clause C; and a (possibly empty) list 
of antecedents A;, where each antecedent is the index of one of the previous steps. The 
first set of steps, denoted Sm, consists of the input clauses without any antecedents. 
Each successive step then consists of a clause and a set of antecedents, such that the 
clause can be derived from the clauses in the antecedents by one or more resolution 
steps. It follows by transitivity that for each step s;, with 7 > m, clause C; is logically 
implied by the input clauses, written Sm H Ci. If, through a series of steps, we can reach 
a step s; where Cy is the empty clause, then the trace provides a proof that Sm F L, 
i.e., the set of input clauses is not satisfiable. 

Tseitin [30] introduced the extended-resolution proof framework in 1966. It allows 
the addition of new extension variables to a resolution proof in a manner that preserves 
the integrity of the proof. In particular, in introducing variable e, there must be an ac- 
companying set of clauses that encode e +> F, where F is a formula over variables 
(both original and extension) that were introduced earlier. These are referred to as the 
defining clauses for extension variable e. Variable e then provides a shorthand notation 
by which F can be referenced multiple times. Doing so can reduce the size of a clausal 
representation of a problem by an exponential factor. 

An extension variable e is introduced into the proof by including its defining clauses 
in the list of clauses being generated. The proof checker must ensure that these added 
clauses do not artificially restrict the set of satisfying solutions. The checker can do this 
by making sure that the defining clauses are blocked with respect to variable e [22]. That 
is, for each defining clause C containing literal e and each defining clause D containing 
literal €, there must be some literal l in C such that its complement lisin D. Asa result, 
resolving clauses C and D will yield a tautology. 
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Tseitin transformations are commonly used to encode a logic circuit or formula as a 
set of clauses without requiring the formulas to be “flattened” into a conjunctive normal 
form over the circuit inputs or formula variables. These introduced variables are called 
Tseitin variables and are considered to be part of the input formula. An extended reso- 
lution proof takes this concept further by introducing additional variables as part of the 
proof. Some problems for which the minimum resolution proof must be of exponential 
size can be expressed with polynomial-sized proofs in extended resolution [8]. 

To validate the proofs, we use a clausal proof system, known as Resolution Asym- 
metric Tautology (RAT), that generalizes extended resolution [32]. RAT is used in in- 
dustry and to validate the results of the SAT competitions [18]. There are various fast 
and formally-verified RAT proof checkers [10,23,29]. 

Clausal proofs also allow the removal of clauses. In our use, we delete clauses when 
the program can determine that they will not be referenced as antecedents for any suc- 
ceeding clauses. As the experimental results of Section 4 demonstrate, deleting clauses 
that are no longer needed can substantially reduce the number of clauses the checker 
must track while processing a proof. 


2.2 Binary Decision Diagrams 


Reduced, Ordered Binary Decision Diagrams (which we refer to as simply “BDDs”) 
provide a canonical form for representing Boolean functions, and an associated set of 
algorithms for constructing them and testing their properties. A number of tutorials have 
been published [2,5,6]. providing a background on BDDs and their algorithms. 

With BDDs, functions are defined over a set of variables X = {x1,22,...,@n}. 
We let Lı and Lo denote the two leaf nodes, representing the constant functions 1 and 
0, respectively. Each nonterminal node u has an associated variable Var(w) and children 
Hi(w), indicating the case where the node variable has value 1, and Lo(w), indicating 
the case where the node variable has value 0. 

Nodes are stored in a unique table, indexed by the key (Var(u), Hi(w), Lo(u)), so 
that isomorphic nodes are never created. The nodes are shared across all of the gener- 
ated BDDs [24]. In presenting algorithms, we assume a function GETNODE(z, u1, uo) 
that checks the unique table for a node with variable x and children uı and up. It ei- 
ther returns the node stored there, or it creates a new node and enters it into the table. 
With this table, we can guarantee that the subgraphs with root nodes u and v represent 
the same Boolean function if and only if u = v. We can therefore identify Boolean 
functions with their BDD root nodes. 

BDD packages support multiple operations for constructing and testing the prop- 
erties of Boolean functions represented by BDDs. A number of these are based on the 
Apply algorithm [4]. Given BDDs u and v representing functions f and g, respectively, 
and a Boolean operation (e.g., AND), the algorithm generates the BDD representation 
w of the operation applied to those functions (e.g., f A g.) For each operation, the pro- 
gram maintains an operation cache indexed by the argument nodes u and v, mapping 
to the result node w. With this cache, the worst case number of recursive steps required 
by the algorithm is bounded by the product of the sizes (in nodes) of the arguments. 

We use the term APPLYAND to refer to the Apply algorithm for Boolean operation 
A^ and APPLYOR to refer to the Apply algorithm for Boolean operation V. 
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3 Proof Generation During BDD Construction 


In our formulation, every newly created BDD node u is assigned an extension variable. 
(As notation, we use the same name for the node and for its extension variable.) We 
then extend the Apply algorithm to generate proofs based on the recursive structure of 
the BDD operations. 

Let Sm denote the set of input clauses. Our goal is to generate a proof that Sm F 
L, i.e., there is no satisfying assignment for these clauses. Our BDD-based approach 
generates a sequence of BDDs with root nodes u1, u2,..., Uz, Where uz = Lo, based 
on a combination of the following operations. (The exact sequencing of operations is 
determined by the evaluation mechanism, as is described in Section 4.) 


1. For input clause C; generate its BDD representation u; using a series of APPLYOR 
operations. 

2. For roots uj and uz, generate the BDD representation of their conjunction u; = 
uj ^ Ux using the APPLYAND operation. 

3. For root uj and some set of variables Y C X, perform existential quantification: 
Up = AY uj. 


Although the existential quantification operation is not mandatory for a BDD-based 
SAT solver, it can greatly improve its performance [13]. It is the BDD counterpart to 
Davis-Putnam variable elimination on clauses [11]. As the notation indicates, there are 
often multiple variables that can be eliminated simultaneously. Although the operation 
can cause a BDD to increase in size, it generally causes a reduction. Our experimental 
results demonstrate the importance of this operation. 

As these operations proceed, we simultaneously generate a set of proof steps. The 
details of each step are given later in the presentation. For each BDD generated, we 
maintain the proof invariant that its root node u; satisfies Sm F uj. 


1. Following the generation of the BDD wu; for clause C;, we also generate a proof 
that C; | u,;. This is described in Section 3.1. 
2. Justifying the conjunctions requires two parts: 
(a) Using a modified version of the APPLYAND algorithm we follow the structure 
of its recursive calls to generate a proof that the algorithm preserves implica- 
tion: uj ^ uz, — u. This is described in Section 3.2. 
(b) This implication can be combined with the earlier proofs that Sm F uj and 
Sm F ug to prove Sm F ur. 
3. Justifying the quantification also requires two parts: 
(a) Following the generation of ux via existential quantification, we perform a sep- 
arate check that u; —> uz. This check uses a proof-generating version of the 
Apply algorithm for implication testing that we refer to as PROVEIMPLICATION. 
This is described in Section 3.3. 
(b) This implication can be combined with the earlier proof that S,,, F uj to prove 
Sm H uk. 


As case 3(a) states, we do not attempt to track the detailed logic underlying the 
quantification operation. Instead, we run a separate check that the quantification pre- 
serves implication. As is the case with many BDD packages, our implementation can 
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perform existential quantification of an arbitrary set of variables in a single pass over 
the argument BDD. A single implication test suffices for the entire quantification. 

Sinz and Biere’s formulation of proof generation by a BDD-based SAT solver [28] 
introduces special extension variables nı and no to represent the BDD leaves Lı and 
Lo. Their proof then includes unit clauses n; and no to force these variables to be set to 
1 and 0, respectively. This formulation greatly reduces the number of special cases to 
consider in the proof-generating version of the APPLYAND operation, but it complicates 
the generation of resolution proofs for the implication test. Instead, we directly associate 
leaves Lı and Lo with T and L, respectively. 

The n variables in the input clauses all have associated BDD variables. The proof 
then introduces an extension variable every time a new BDD node is created. In the fol- 
lowing presentation, we use the node name (e.g., u) to indicate the associated extension 
variable. In the actual implementation, the extension variable identifier (an integer) is 
stored as one of the fields in the node representation. 

When creating a new node, the GETNODE function adds (up to) four defining 
clauses for the associated extension variable. For node u with variable Var(u) = x, 
Hi(w) = uy, and Lo(u) = uo, the clauses are: 


Notation Formula Clause 
HD(u) ae >(u>u) Tuu 
LD(u) T— (u—uo) zuu 
HU(u) <x—>(u>u) Tūu 
LU(u) T> (uo>u) zou 


The names for these clauses combine an indication of whether they correspond to vari- 
able x being 1 (H) or 0 (L) and whether they form an implication from the node down 
to its child (D) or from the child up to its parent (U). When either node wo or u; is a leaf 
node, some of these clauses degenerate to tautologies. Such clauses are omitted from 
the proof. Each clause is numbered according to its position in the sequence of clauses 
comprising the proof. These defining clauses encode the assertion u 4+ ITE(x, u1, uo), 
where ITE denotes the if-then-else operation, defined as ITE(x, y, z) = (x ^y) V (TA^z2). 
As can be seen, the defining clauses are blocked with respect to extension variable u. 


3.1 Generating BDD Representations of Clauses 


The BDD representation u of a clause C is generated by using the APPLYOR operation 
on the BDD representations of its literals. This BDD has a simple, linear structure with 
one node for each literal. Each successive node has a branch to leaf node Lı when the 
literal is true and to the next node in the chain when the literal is false. The proof that 
C F wis based on this linear structure, employing the upward defining clauses HU and 
LU for the nodes in the chain [28]. 


3.2 The APPLYAND Operation 


The key idea in generating proofs for the AND operation is to follow the recursive 
structure of the Apply algorithm. We do this by integrating proof generation into the 
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Terminal Cases APPLYANDRECUR(u, v) 


Case Result J— 4} 
— x <— min(Var(u), Var(v)) 
=U (u, T) if x = Var(u): 
u = Lo (Lo, T) u1, uo <— Hi(u), Lo(u) 
v = Lo (Lo, T) J <— J U {HD(u), LD(u)} 
u= Lı (v, T) else: u1, uo <— u, u 
vali (u, T) if x = Var(v): 


v1, vo <— Hi(v), Lo(v 
J <— JU{HD(v), LD(v)} 
else: U1, Uo <— v, v 
w1, $1 <— APPLYAND(u1, v1) 
wo, So <— APPLYAND(uo, vo) 
J<— JU {s1, so} 
if wı = wo: 
WwW — Wi 
else: 
w <— GETNODE(2, w1, wo) 
J <— JU{HU(w), LU(w)} 
s <— JUSTIFYAND((u, v, w), J) 
AndCache((u, v)) <— (w, s) 
return (w, s) 


Fig. 1. Terminal cases and recursive step of APPLYAND operation, modified for proof generation. 
Each call returns both a node and a proof step. 


APPLYAND procedure. The overall control flow is identical to the standard version, 
except the function returns both a BDD node w and a step number s. For arguments u 
and v, the generated step s has clause uv w along with antecedents defining a resolution 
proof of the implication w\v — w. We refer to this as the justification for the operation. 
The operation cache is modified to hold both the returned node and the justifying step 
number as values. 

Figure | shows the main components of the implementation. When the two ar- 
guments are equal or one of the leaves is a terminal node, then the recursion termi- 
nates (left). These cases have tautologies as their justification. Failing a terminal case, 
the code checks in the operation cache for matching arguments u and v, returning the 
cached result if found. 

Failing the terminal case tests and the cache lookup, the program proceeds as shown 
in the procedure APPLYANDRECUR (right). Here, the procedure branches on the vari- 
able x that is the minimum of the two root variables. The procedure accumulates a set 
of steps J to be used in the implication proof. These include the two steps (possibly 
tautologies) from the two recursive calls. At the end, it invokes a function JUSTIFYAND 
to generate the required proof. It stores both the result node w and the proof step s in 
the operation cache, and it provides these values as the return values. 


Proof Generation for the General Case. Proving the nodes generated by APPLYAND 
satisfy the implication property proceeds by inducting on the structure of the argument 
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WHU ANDH ANDL WLU 
UHD TWW ui Vi W1 Uo Vo Wo £ Wo Ww ULD 
VHD TUUy © U1 V1 w £ Uo Vo W LU UO VLD 
TUV TUVW LUV W £U VO 
TUVW TUVW 
uuw 


Fig. 2. Resolution proof for general step of the APPLYAND operation 


and result BDDs. That is, it can assume that the results wı and wo of the recursive calls 
to arguments u and vı and to up and vo satisfy the implications u, A vı — wy, and 
Ug A Up —> Wo, and that these calls generated proof steps sı and so justifying these 
implications. Figure 2 shows the structure of the resolution proof for the general case, 
where none of the equalities hold and the recursive calls do not yield tautologies. The 
proof relies on the following clauses as antecedents, arising from the recursive calls and 
from the defining clauses for nodes u, v, and w: 


Term Formula Clause Term Formula Clause 
ANDH u, Av > w1 U1, V1 w1 ANDL ug A vo > Wo Uo Vo Wo 
WHU z—>(w >w) Fw WLU FZ >(wo >w) ztUow 
UHD #«->(u>ui) Tuu ULD 7T- (u— uo) zuu 
VHD wo(u>vu) TU VLD T—>(v—>vo) xd 


Along the left, the clauses cover the case of x = 1, first resolving clause ANDH and 
WHU, then resolving the result first with clause UHD and then clause VHD. A similar 
progression occurs along the right covering the case of x = 0. The two chains are 
then merged by resolving on variable x to yield the final implication. As this figure 
illustrates, a total of seven resolution steps are required. These can be merged into two 
linear resolution chains, and so the proof generator produces at most two clauses per 
APPLYAND operation. 


Proof Generation for Special Cases. The proof structure shown in Figure 2 only holds 
for the most general form of the recursion. However, there are many special cases, such 
as when the recursive calls yield tautologous results, when some of the child nodes are 
equal, and when the two recursive calls return the same node. 

Our method for handling both the general and special cases relies on the V-shaped 
structure of the proofs, as is illustrated in Figure 2. That is, there are two linear chains, 
one along the left and one along the right consisting of some subsequence of the fol- 
lowing clauses: 


Ay = ANDH, WHU, UHD, VHD 
Ar = ANDL, WLU, ULD, VLD 
These will be proper subsequences when some of the clauses are not included in the 


set J in APPLYAND (Figure 1), or they are tautologies. In addition, some of the clauses 
may be extraneous and therefore must not occur as antecedents. 


84 R. E. Bryant and M. J. H. Heule 


Rather than trying to enumerate the special cases, we found it better to create a 
general-purpose linear chain resolver that handles all of the cases in a uniform way. This 
resolver is called on the each of the clause sequences Ay and Az. It proceeds through 
a sequence of clauses, discarding any tautologies and any clauses that do not resolve 
with the result so far. It then emits the proof clauses with the selected antecedents. 


3.3 Testing Implication 


Terminal Cases PROVEIMPLICATIONRECUR(u, v) 


Case Result J — 0 
— x 4— min(Var(w), Var(v)) 
u=v Y if x = Var(u): 
u= Lo ua U1, uo <— Hi(u), Lo u) 
v= L J <— JU{HD(u), LD(u)} 
u= Lı,v Æ Lı Error else: u1, Uo +— u, u 
v = Lo,u Æ Lo Error if x = Var(v): 


v1, vo <— Hi(v), Lo(v 

J <— JU{HU(v), LU(v)} 
else: v1, Vo <— U,V 
sı <— PROVEIMPLICATION(u1, v1) 
80 <— PROVEIMPLICATION(uo, vo) 
J <— JU {s1, so} 
s <— JUSTIFYIMPLICATION((w, v), J) 
ImplyCache((u, v)) <— s 
return s 


Fig. 3. Terminal cases and recursive step of PROVEIMPLICATION operation 


When the existential quantification operation applied to node u generates node v, 
the program generates a proof that u — v, by calling procedure PROVEIMPLICATION 
with u and v as arguments. This procedure has the same recursive structure as the 
Apply algorithm, except that it does not generate any new nodes. It only returns the 
step number for a proof of the clause uv. It uses an operation cache, but only to hold 
proof step numbers. Figure 3 shows the terminal cases for this procedure, as well as the 
recursion that occurs when neither a terminal case applies nor are the arguments found 
in the operation cache. A failure of the implication test indicates an error in the solver, 
and so it signals a fatal error if the implication does not hold. 

Each recursive step accumulates up to six proof steps as the set J to be used in the 
implication proof: 


Term Formula Clause Term Formula Clause 
IMH ui U1 Uy Vy IML Uo — Vo Uo Vo 
UHD g—>(u>u) Tuu ULD T> (u— uo) zTūuo 
VHU gz—>(v >v) Tw VLU T—>(v—>v) Lov 
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UHD IMH IML ULD 
VHU TUU Ui vi Uo Vo xU UO VLU 
TULU Tuv XU VO £U V 
TUV TUV 
WU 


Fig. 4. Resolution proof for general step of the PROVEIMPLICATION operation 


The resolution proof for the general case is shown in Figure 4. It has a similar structure 
to the proof for the APPLYAND operation, with two linear chains combined by a res- 
olution on variable x. Our same general-purpose linear chain resolver can handle both 
the general case and the many special cases that arise. 


4 Experimental Results 


We implemented the proof-generating, SAT solver PGBDD (for Proof-Generating BDD). 
It is written entirely in Python and consists of around 2000 lines of code, including a 
BDD package, support for generating extended-resolution proofs, and the overall SAT 
solver framework.! 

Although slow, it can handle large enough benchmarks to provide useful insights 
into the potential for a BDD-based SAT solver to generate proofs of challenging prob- 
lems, especially when quantification is supported. It generates proofs in the LRAT for- 
mat [9]. 

Our BDD package supports mark-and-sweep garbage collection. It starts the mark- 
ing using the root nodes for all active terms in the sequence of root nodes u1, u2,.... 
Following the marking phase, it traverses the unique table and eliminates the unmarked 
nodes. It also traverses the operation caches and eliminates any entries for which one of 
the argument nodes or the result node is unmarked. When a node is deleted, the solver 
can also direct the proof checker to delete its defining clauses. Similarly, when an entry 
is deleted from the operation cache, the solver can direct the proof checker to delete 
those clauses added while generating the justification for the entry. 

In addition to the input CNF file, the program can accept a variable-ordering file, 
mapping the input variables in the CNF to their levels in the BDD. 

The solver supports three different evaluation mechanisms: 


Linear: Form the conjunction of the clauses, according to their order in the input file. 
No quantification is performed. This matches the operation described in [28]. 
Bucket Elimination: Place the BDDs representing the clauses into buckets according 
to the level of their topmost variable. Then process the buckets from lowest to high- 
est. While a bucket has more than one element, repeatedly remove two elements, 
form their conjunction, and place the result in the bucket designated by the topmost 
variable. Once the bucket has a single element, existentially quantify the topmost 


' The solver, along with code for generating and testing a set of benchmarks, is available at 
https://github.com/rebryant/pgbdd-artifact. 
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variable and place the result in the appropriate bucket [12]. This matches the oper- 
ation described in [21]. 

Scheduled: Perform operations as specified by a scheduling file. This file contains a 
sequence of lines, each providing a command in a simple, stack-based notation: 


CCy,...,¢, Generate and push the BDDs for the specified clauses. 
am Pop and conjoin the top m elements. Push the result. 
qvU1,--.,U~ Quantify the top element by the specified variables. 


In generating benchmarks, we wrote programs to generate the CNF files, the variable 
orderings, and the schedules in a unified framework. 

For all of our benchmarks we report the total number of clauses in the proof, in- 
cluding the input clauses, the defining clauses for the extension variables (up to four 
per BDD node generated) and the derived clauses (one per input clause and up to two 
per result inserted into either AndCache or ImplyCache.) 

We compare the performance of our BDD-based SAT solver with that of KISSAT, 
the winner of the 2020 SAT competition [3], representing the state of the art in search- 
based SAT solvers. 


4.1 Mutilated Chessboard 


The mutilated chessboard problem considers an n x n chessboard, with the corners on 
the upper left and the lower right removed. It attempts to tile the board with dominos, 
with each domino covering two squares. Since the two removed squares had the same 
color, and each domino covers one white and one black square, no tiling is possible. 
This problem has been well studied in the context of resolution proofs, for which it can 
be shown that any proof must be of exponential size [1]. 

A standard CNF encoding involves defining Boolean variables to represent the 
boundaries between adjacent squares, set to 1 when a domino spans the two squares, 
and set to 0 otherwise. The clauses then encode an Exactly! constraint for each square, 
requiring each square to share a domino with exactly one of its neighbors. We label the 
variables representing a horizontal boundary between a square and the one below as 
Yi,j, With 1 <i < nand 1 < j < n. The variables representing the vertical boundaries 
are labeled z; j, with 1 < ¿ < n and 1 < j < n. With a mutilated chessboard, we have 
Yi = 21,1 = Yn—1,n = Tn n—1 = l. 

As the log-log plot in Figure 5 shows, PGBDD has exponential performance when 
using linear conjunction or bucket elimination. Indeed, KISSAT outperforms PGBDD 
when operating in these modes. However, KISSAT can also be seen to have exponential 
performance—to reach n = 22, it generates a proof with over 136 million clauses. 

On the other hand, another approach, inspired by symbolic model checking [7] 
yields polynomial performance. It is based on the following observation: when pro- 
cessing the columns from left to right, the only information required to place dominos 
in column j is the identity of those rows 7 for which a domino crosses horizontally from 
j — 1 to j. This information is encoded in the values of v;,;_; for 1 < i <n. 
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Mutilated Chessboard Clauses 
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Fig. 5. Total number of clauses in proofs of n x n mutilated chess boards. The proofs using the 
column scanning approach grow as n?"°?, 


Let us group the variables into columns, with X; denoting variables 71/;,...,%n_j, 
and Y; denoting variables yj, ...,%—1,;. Scanning the board from left to right, con- 
sider X; to encode the “state” of processing after completing column j. As the scanning 
process reaches column j, there is a characteristic function oj—1(X;j—1) describing the 
set of allowed crossings of horizontally-oriented dominos from column j — 1 into col- 
umn 7. No other information about the configuration of the board to the left is required. 
The characteristic function after column j can then be computed as: 


oj(X;) = 3X; [Opa aa) A IY; TG, Yj, X;)] (1) 


where T;(X,;_1, Yj, X;) is a “transition relation” consisting of the conjunction of the 
Exactly 1 constraints for column 7. From this, we can existentially quantify the variables 
Y; to obtain a BDD encoding all compatible combinations of the variables X;_; and 
X;. By conjuncting this with the characteristic function for column j — 1 and existen- 
tially quantifying the variables X;_1, we obtain the characteristic function for column 
j. With a mutilated chessboard, we generate leaf node Lo in attempting the final con- 
junction. Note that Equation (1) does not represent a reformulation of the mutilated 
chessboard problem. It simply defines a way to schedule the conjunction and quantifi- 
cation operations over the input clauses and variables. 

In our experiments, we found that this scanning reaches a fixed point after pro- 
cessing n/2 columns. That is, from that column onward, the characteristic functions 
become identical, except for a renaming of variables. This indicates that the set of all 
possible horizontal configurations stabilizes halfway across the board. Moreover, the 
BDD representations of the states grow as O(n”). For n = 124, the largest has just 
3,969 nodes. 
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One important rule-of-thumb in symbolic model checking is that the successive 
values of the next-state variables must be adjacent in the variable ordering. Furthermore, 
the vertical variables in Y; must be close to their counterparts in X;_; and X;. Both 
objectives can be achieved by ordering the variables row-wise, interleaving the variables 
Ti and yi j, ordering first by row index 7 and then by column index j. This requires 
the quantification operations of Equation 1 to be performed on non-root variables. 

Figure 5 shows that the “column-scanning” approach yields performance scaling as 
n69, allowing us to handle cases up to n = 124. Keep in mind that the problem size 
here should be measured as n?, the number of squares in the board. Thus, a problem 
instance with n = 124 is over 31 times larger than one with n = 22 (the upper limit 
reached by KISSAT), in terms of the number of input variables and clauses. Indeed, 
the case of n = 22 is straightforward for PGBDD, requiring only a few seconds and 
generating a proof with 161,694 clauses.” By contrast, KISSAT requires 12.6 hours and 
generates over 136 million clauses. 

The plot labeled “No Quantification” demonstrates the importance of including ex- 
istential quantification in solving this problem. These data were generated by using the 
same schedule as with column scanning, but with all quantification operations omitted. 
As can be seen, this approach could not scale beyond n = 14. 

Most attempts to generate propositional proofs of the mutilated chessboard have 
exponential performance. No solver in the 2018 SAT competition could handle the in- 
stance with n = 20. Heule, Kiesl, and Biere [19] devised a problem-specific approach 
that could generate proofs up to n = 50 by exploiting special symmetries in the prob- 
lem, using a set of rewriting rules to dramatically reduce the search space. Our approach 
also exploits symmetries in the problem, but by exploiting a way to compactly encode 
the set of possible configurations between successive columns. Other than these two, 
we know of no other approach for generating polynomially-sized propositional proofs 
for the problem. 


4.2 Pigeonhole Problem 


The pigeonhole problem is one of the most studied problems in propositional reasoning. 
Given a set of n holes and a set of n+1 pigeons, it asks whether there is an assignment of 
pigeons to holes such that 1) every pigeon is in some hole, and 2) every hole contains at 
most one pigeon. The answer is no, of course, but any resolution proof for this must be 
of exponential length [15]. Groote and Zantema have shown that any BDD-based proof 
of the principle that only uses the Apply algorithm must be of exponential size [14]. On 
the other hand, Cook constructed an extended resolution proof of size O(n*), in part to 
demonstrate the expressive power of extended resolution [8]. 

We consider two encodings of the problem. Both are based on a set of variables p; j 
for 1 < i < nand1 < j < n+ 1, with the interpretation that pigeon j is assigned 
to hole 7. Encoding the property that each pigeon j is assigned to some hole can be 
expressed as a single clause: 


n 
Pigeon, = V pis 
i=1 


? All times reported here were measured on a 3 GHz Intel i7-9700 CPU with 16GB of memory. 
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Fig. 6. Total number of clauses in proofs of pigeonhole problem for n holes. Using a direct en- 
coding led to exponential performance, but using a Tseitin encoding and column scanning gives 
proofs that grow as n®-°°, 


Encoding the property that each hole 7 contains at most one pigeon can be done in 
two different ways. A direct encoding simply states that for any pair of pigeons 7 and 
k, at least one of them must not be in hole 27: 


n+1 n+1 
Direct; = \ A Pij V Pik 
j=1 k=j+1 


This encoding requires O(n) clauses for each hole, yielding a total CNF size of O (n°). 

A second, Tseitin encoding introduces Tseitin variables to track which holes are 
occupied, starting with pigeon | and working upward. We use an encoding published 
by Sinz [27] that uses Tseitin variables s; j for 1 <7 < nand 1 < j < n, where s; ; 
equals 1 if a pigeon 7’ occupies hole i for some j’ < j. It requires 3n — 1 clauses and 
n Tseitin variables per hole, yielding an overall CNF size of O(n7). 

As is illustrated by the log-log plots of Figure 6, this choice of encoding not only 
affects the CNF size, it dramatically affects the size of the proofs generated by PGBDD. 
With a direct encoding, we could not find any combination of evaluation strategy or 
variable ordering that could go beyond n = 16. Similarly, the Tseitin encoding did 
not help when using linear evaluation or bucket elimination. Indeed, we see KISSAT, 
using the Tseitin encoding, matching or exceeding our program for these cases, but all 
of these have exponential performance. (KISSAT could only reach n = 15 when using 
a direct encoding.) 

On the other hand, the column scanning approach used for the mutilated checker- 
board can also be applied to the pigeonhole problem when the Tseitin encoding is used. 
Consider an array with hole 2 represented by row i and pigeon j represented by col- 
umn j. Let S; represent the Tseitin variables s; j for 1 < i < n. The “state” is then 
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encoded in these Tseitin variables. In processing pigeon j, we can assume that the pos- 
sible combinations of values of Tseitin variables S;_; is encoded by a characteristic 
function oj-1(95;—-1). In addition, we incorporate into this characteristic function the 
requirement that each pigeon k, for 1 < k < j — 1 is assigned to some hole. Letting P; 
denote the variables p; j for 1 < i < n, the characteristic function at column j can then 
be expressed as: 


05(53) = Sj- [oj—-1(Sj-1) AAP; T; (S51, Pj, S3)] (2) 


where the “transition relation” Tj consists of the clauses associated with the Tseitin 
variables, plus the clause encoding constraint Pigeon,;. As with the mutilated chess- 
board, having a proper variable ordering is critical to the success of a column scanning 
approach. We interleave the ordering of the variables p; j and s; j, ordering them first 
by 7 (holes) and then by j (pigeons.) 

Figure 6 demonstrates the effectiveness of the column-scanning approach. We were 
able to handle instances up to n = 150, and with an overall performance trend of n*-??. 
Our achieved performance therefore improves on Cook’s bound of O(n*). A SAT- 
solving method developed by Heule, Kies], Seidl, and Biere can generate short proofs of 
multiple encodings of pigeon hole formulas, including the direct encoding [20]. These 
proofs are similar to ours after transforming them into the same proof format and the 
size is also O(n?) [17]. 

Unlike with the mutilated chessboard, the scanning does not reach a fixed point. 
Instead, the BDDs start very small, because they must encode the locations of only 
a small number of occupied holes. They reach their maximum size at pigeon n/2, as 
the number of combinations for occupied and unoccupied holes reaches its maximum. 
Then the BDD sizes drop off as the encoding needs to track the positions of a decreasing 
number of unoccupied holes. Fortunately, all of these BDDs grow quadratically with n, 
reaching a maximum of 5,702 nodes for n = 150. 


4.3 Evaluation 


Overall, our results demonstrate the potential for generating small proofs of unsatisfia- 
bility using BDDs. We have achieved polynomial performance for problems for which 
search-based SAT solvers have exponential performance. 

Other studies have compared BDDs to search-based SAT on a variety of bench- 
mark problems. Several of these observed exponential performance for BDD-based 
solvers for problems for which we have obtained polynomial performance. Uribe and 
Stickel [31] ran experiments with the mutilated chessboard problem, but they did not 
do any variable quantification. Pan and Vardi [25] applied a variety of scheduling and 
variable ordering strategies for the mutilated chessboard and pigeonhole problems. Al- 
though they were able to get better performance than with a search-based SAT solver, 
they still observed exponential scaling. Obtaining polynomial performance for these 
problems requires more problem-specific approaches than the ones they considered. 

Table 1 provides some performance data for the largest instances solved for the two 
benchmark problems. A first observation is that these problems are very large, with tens 
of thousands of input variables and clauses. 
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Table 1. Summary data for the largest problems solved 


Chessboard Pigeonhole 
Instance Chess-124 Pigeon-Tseitin-150 
Input variables 30,500 45,150 
Total BDD nodes 3,409,112 17,861,833 
Maximum live nodes 198,967 225,446 
Input clauses 106,136 67,501 
Defining clauses 12,127,031 62,585,397 
Derived clauses 5,348,303 81,019,084 
Maximum live clauses 751,944 1,297,039 
SAT time (secs) 5,366 5,206 
Checking time (secs) 30 240 


The total number of BDD nodes indicates the total number generated by the function 
GETNODE, and for which extension variables are created. These are numbered in the 
millions, and far exceed the number of input variables. On the other hand, the maximum 
number of live nodes shows the effectiveness of garbage collection—at any given point 
in the program, at most 6% of the total number of nodes must be stored in the unique 
table and tracked in the operation caches. Garbage collection also keeps the number 
of clauses that must be tracked by the proof checker below 5% of the total number 
of clauses. The elapsed time for the SAT solver ranges up to 1.5 hours. We believe, 
however, that an implementation in a more performant language would reduce these 
times greatly. The checking times are shown for an LRAT proof checker written in the 
C programming language. The proofs have also been checked with a formally verified 
proof checker based on the HOL theorem prover [29]. 


5 Conclusion 


Biere, Sinz, and Jussila [21,28] made the critical link between BDDs and extended 
resolution proofs. We have shown that adding the ability to perform arbitrary existential 
quantification can greatly increase the performance of a proof-generating, BDD-based 
SAT solver. 


Generating proofs for the two benchmarks problems required special insights into 
their structure and then crafting evaluation mechanisms to exploit their properties. We 
believe, however, that the column scanning approach we employed could be generalized 
and made more automatic. 

The ability to generate correctness proofs in a BDD-based SAT solver invites us to 
consider generating proofs for other tasks to which BDDs are applied, including QBF 
solving, model checking, and model counting. Perhaps a proof of unsatisfiability could 
provide a useful building block for constructing correctness proofs for these other tasks. 
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Abstract. This paper introduces a bounded model checking (BMC) 
algorithm for hyperproperties expressed in HyperLTL, which — to the 
best of our knowledge — is the first such algorithm. Just as the classic 
BMC technique for LTL primarily aims at finding bugs, our approach 
also targets identifying counterexamples. BMC for LTL is reduced to 
SAT solving, because LTL describes a property via inspecting individual 
traces. Our BMC approach naturally reduces to QBF solving, as Hyper- 
LTL allows explicit and simultaneous quantification over multiple traces. 
We report on successful and efficient model checking, implemented in our 
tool called HyperQube, of a rich set of experiments on a variety of case 
studies, including security, concurrent data structures, path planning for 
robots, and mutation testing. 


1 Introduction 


Hyperproperties [10] have been shown to be a powerful framework for specifying 
and reasoning about important classes of requirements that were not possible 
with trace-based languages such as the classic temporal logics. Examples include 
information-flow security, consistency models in concurrent computing [6], and 
robustness models in cyber-physical systems [5,35]. The temporal logic Hyper- 
LTL [9] extends LTL by allowing explicit and simultaneous quantification over 
execution traces, describing the property of multiple traces. For example, the 
security policy observational determinism can be specified by the following Hy- 
perLTL formula: Vr4.VrB.(Or4 © Org) W ~liz, © irp) Which stipulates that 
every pair of traces 74 and 7 have to agree on the value of the (public) output 
o as long as they agree on the value of the (secret) input i, where ‘W°’ denotes 
the weak until operator. 

There has been a recent surge of model checking techniques for HyperLTL 
specifications [9, 12,22, 24]. These approaches employ various techniques (e.g., 
alternating automata, model counting, strategy synthesis, etc) to verify hyper- 
properties. However, they generally fall short in proposing a general push-button 
method to deal with identifying bugs with respect to HyperLTL formulas involv- 
ing quantifier alternation. Indeed, quantifier alternation has been shown to gen- 
erally elevate the complexity class of model checking HyperLTL specifications in 
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different shapes of models [2,9]. For example, consider the simple Kripke struc- 
ture K in Fig. 1 and HyperLTL formulas y, = Vr4.Vag.O(pr, > Prp) and 
p2 = Vra.drg.OO(pr, Prg). Proving that K | pı (where traces for mA 
and mp are taken from K) can be reduced to building the self-composition of K 
and applying standard LTL model checking, resulting in worst-case complexity 
|K|? in the size of the system. On the contrary, proving that K } @ is not as 
straightforward. In the worst case, this requires a subset generation to encode 
the existential quantifier within the Kripke structure, resulting in |K|-2!*! blow 
up. In addition, the quantification is over traces rather than states, adding to 
the complexity of reasoning. 

Following the great success of bounded 
model checking (BMC) for LTL specifica- 
tions [8], in this paper, we propose a BMC 
algorithm for HyperLTL. To the best of 
our knowledge this is the first such algo- 
rithm. Just as BMC for LTL is reduced 
to SAT solving to search for a counterex- Fig. 1: A Kripke structure. 
ample trace whose length is bounded by some integer k, we reduce BMC for 
HyperLTL to QBF solving to be able to deal with quantified counterexam- 
ple traces in the input model. More formally, given a HyperLTL formula, e.g., 
yp = Vra.dag.y, and a family of Kripke structures K = (K4, Kp) (one per trace 
variable), the reduction involves three main components. First, the transition re- 
lation of K, (for every 7) is represented by a Boolean encoding | K,,]. Secondly, 
the inner LTL subformula % is translated to a Boolean representation [y] in 
a similar fashion to the BMC unrolling technique for LTL. This way, the QBF 
encoding for a bound k > 0 roughly appears as: 


IK, >y]; = Sea.VEB.[Ka]e A (Kel: > [Vv] x) (1) 
where the vector of Boolean variables Tq (respectively, Tg) are used to represent 
the states and propositions of K4 (resp. Kp) for steps from 0 to k. Formulas 
[Ka], and [Kp], are the unrollings K4 (using 4) and Kg (using 7p), and [77] 
(that uses both z4 and Tp) is the fixpoint Boolean encoding of ~y. The proposed 
technique in this paper does not incorporate a loop condition, as implementing 
such a condition for multiple traces is not straightforward. This, of course, comes 
at the cost of lack of a completeness result. 

While our QBF encoding is a natural generalization of BMC for HyperLTL, 
the first contribution of this paper is a more refined view of how to interpret 
the behavior of the formula beyond the unrolling depth k. Consider LTL for- 
mula Yr.Opr. BMC for LTL attempts to find a counterexample by unrolling 
the model and check for satisfiability of da.<—p, up-to bound k. Now consider 
LTL formula Yr. © pr whose negation is 3r. pr. In the classic BMC, due to 
its pessimistic handling of D, the unsatisfiability of the formula cannot be estab- 
lished in the finite unrolling (handling these formulas requires either a looping 
condition or to reach the diameter of the system). This is because 7p, is not 
sometimes finitely satisfiable (SFS), in the terminology introduced by Havelund 
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and Peled [27], meaning that not all satisfying traces of Op, have a finite prefix 
that witness the satisfiability. 

We propose a method that allows to interpret a wide range of outcomes of 
the QBF solver and relate these to the original model checking decision problem. 
To this end, we propose the following semantics for BMC for HyperLTL: 

— Pessimistic semantics (like in LTL BMC) under which pending eventuali- 
ties are considered to be unfulfilled. This semantics works for SFS temporal 
formulas and paves the way for bug hunting. 

— Optimistic semantics considers the dual case, where pending eventualities 
are assumed to be fulfilled at the end of the trace. This semantics works 
for sometimes finitely refutable (SFR) formulas, and allows us to interpret 
unsatisfiability of QBF as proof of correctness even with bounded traces. 

— Halting variants of the optimistic and pessimistic semantics, which allow 
sound and complete decision on a verdict for terminating models. 

We have fully implemented our technique in the tool HyperQube. Our exper- 
imental evaluation includes a rich set of case studies, such as information-flow 
security, linearizability in concurrent data structures, path planning in robotic 
applications, and mutation testing. Our evaluation shows that our technique is 
effective and efficient in identifying bugs in several prominent examples. We also 
show that our QBF-based approach is certainly more efficient than a brute-force 
SAT-based approach, where universal and existential quantifiers are eliminated 
by combinatorial expansion to conjunctions and disjunctions. We also show that 
in some cases our approach can also be used as a tool for synthesis. Indeed, a 
witness to an existential quantifier in a HyperLTL formula is an execution path 
that satisfies the formula. For example, our experiments on path planning for 
robots showcase this feature of HyperQube. 

In summary, the contributions of this paper are as follows. We (1) propose a 
QBF-based BMC approach for verification and falsification of HyperLTL spec- 
ifications; (2) introduce complementary semantics that allow proving and dis- 
proving formulas, given a finite set of finite traces, and (3) rigorously analyze the 
performance of our technique by case studies from different areas of computing. 


2 Preliminaries 


2.1 Kripke Structures 


Let AP be a finite set of atomic propositions and © = 24? be the alphabet. A 
letter is an element of X. A trace t € X“ over alphabet © is an infinite sequence 
of letters: t = t(0)t(1)t(2) --- 


Definition 1. A Kripke structure is a tuple K = (S, Sinit, ô, L), where 
— S is a finite set of states; 
— Sini C S is the set of initial states; 
— ôC Sx S is a transition relation, and 
— L: S — È} is a labeling function on the states of K. 
We require that for each s € S, there exists s’ € S, such that (s,s’) € ô. 
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Fig. 1 shows a Kripke structure, where Sini = {so}, L(so) = {p}, L(s4) = 
{q, halt}, etc. The size of the Kripke structure is the number of its states. A loop 
in K is a finite sequence s(0)s(1)--- s(n), such that (s(i),s(i + 1)) € ô, for all 
0 <i<_n, and (s(n), s(0)) € 6. We call a Kripke frame acyclic, if the only loops 
are self-loops on otherwise terminal states, i.e., on states that have no other 
outgoing transition. Since Definition 1 does not allow terminal states, we only 
consider acyclic Kripke structures with such added self-loops. We also label such 
states by atomic proposition halt. 

A path of a Kripke structure is an infinite sequence of states s(0)s(1)--- € S®, 
such that s(0) € Sinz, and (s(z),s(i+1)) € 6, for all z > 0. A trace of a 
Kripke structure is a trace t(0)t(1)¢(2)--- € S*, such that there exists a path 
s(0)s(1)--- € SY with t(i) = L(s(2)) for all i > 0. We denote by Traces(K, s) the 
set of all traces of K with paths that start in state s € S, and use Traces( K) as 


a shorthand for Uses, Traces(K, s). 


2.2 The Temporal Logic HyperLTL 


Syntax. HyperLTL [9] is an extension of the linear-time temporal logic (LTL) 
for hyperproperties. The syntax of HyperLTL formulas is defined inductively by 
the following grammar: 


e ::= Ir.y | Yr.y | ọ 
$ ::=true |ar| ~| OVO|dAG| GU 6| dR O|O¢ 


where a € AP is an atomic proposition and 7 is a trace variable from an infinite 
supply of variables V. The Boolean connectives =, V, and A have the usual 
meaning, U is the temporal until operator, R is the temporal release operator, 
and © is the temporal next operator. We also consider other derived Boolean 
connectives, such as —, and «+, and the derived temporal operators eventually 
Ov = true U vy and globally Oy = 7O-y. Even though the set of operators 
presented is not minimal, we have introduced this set to uniform the treatment 
with the variants in Section 3. The quantified formulas 4a and Vr are read as 
“along some trace 7” and “along all traces 7”, respectively. A formula is closed 
(i.e., a sentence) if all trace variables used in the formula are quantified. We 
assume, without loss of generality, that no variable is quantified twice. We use 
Vars(y) for the set of path variables used in formula y. 


Semantics. An interpretation T = (Tx)revars(y) Of a formula y consists of a 
tuple of sets of traces, with one set Tp per trace variable m in Vars(y), denoting 
the set of traces assigned to 7. Note that we allow quantifiers to range over 
different models. We will use this feature in the verification of hyperproperties 
such as linearizability, where different quantifiers are associated with different 
sets of executions (in this case one for the concurrent implementation and one 
for the sequential implementation). That is, each set of traces comes from a 
Kripke structure and we use K = (Kx) revars(y) to denote a family of Kripke 
structures, so Tp = Traces(K,,) is the traces that 7 can range over, which comes 
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from K,. Abusing notation, we write 7 = Traces(K). Note that picking a single 
K and letting Kr = K for all z is a particular case, which leads to the original 
semantics of HyperLTL [9]. 

Our semantics of HyperLTL is defined with respect to a trace assignment, 
which is a partial map IZ: Vars(y) — £“. The assignment with the empty 
domain is denoted by Hg. Given a trace assignment I, a trace variable 7, and 
a concrete trace t € X“, we denote by [7 — t] the assignment that coincides 
with M everywhere but at m, which is mapped to trace t. The satisfaction of 
a HyperLTL formula ¢ is a binary relation = that associates a formula to the 
models (7, M, i) where i € Zso is a pointer that indicates the current evaluating 
position. The semantics is defined as follows: 


(T, 11,0) = An. w iff there isa t€ Tp, such that (T, [x —> t],0) Ev, 

(T, 11,0) = Yr. Y iff for allt € Tp, such that (7, H[r — t], 0) = Y, 

(T, I,i) = true 

(T, II, i) H| ar if ae H(r)(i), 

(T, 11,1) = >y if (T, I,i) Fy, 

(T, I,i) F Yı V y2 iff (T, I,i) H Yı or (T, IT, 1) = y2, 

(T, I, i) = wy A we iff (T, Ii) Ea and (T, I,i) E ve, 

(T, IT, 2) = Op iff (711,04 1) = Y, 

(T, I,i) =Y U p2 if there is a j > ifor which (7, 7,7) H Y2 and 
for all k € fi, j), (T, II, k) Ev, 

(T, I,i) =Y Rv if either for all j > i, (T, I, j) H wa, or, 


for some j > i, (T,II, j) H v1 and 
for all k € fi, j| : (T, I, k) H ve. 


This semantics is slightly different from the definition in [9], but equiv- 
alent (see [30]). We say that an interpretation 7 satisfies a sentence y, de- 
noted by T = ọ, if (T, Hg,0) E p. We say that a family of Kripke structures 
K satisfies a sentence y, denoted by K = 9, if (Traces(Kr))re varste) FY: 
When the same Kripke structure K is used for all path variables we write 
K | ọ. For example, the Kripke structure in Fig. 1 satisfies HyperLTL for- 
mula y = Vrg.drg.OO (Pr, P Pra): 


3 Bounded Semantics for HyperLTL 


We introduce now the bounded semantics of HyperLTL, used in Section 4 to 
generate queries to a QBF solver to aid solving the model checking problem. 


3.1 Bounded Semantics 


We assume the HyperLTL formula is closed and of the form 
Qata-Qprg...Qzmz.v, where Q € {V,5} and it has been converted into 
negation-normal form (NNF) so that the negation symbol only appears in front 
of atomic propositions, e.g., -a,,,. Without loss of generality and for the sake of 
clarity from other numerical indices, we use roman alphabet as indices of trace 
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variables. Thus, we assume that Vars(y) C {m4,7B,...,7z}. The main idea of 
BMC is to perform incremental exploration of the state space of the systems by 
unrolling the systems and the formula up-to a bound. Let k > 0 be the unrolling 
bound and let T = (T,4...Tz) be a tuple of sets of traces, one per trace vari- 
able. We start by defining a satisfaction relation between HyperLTL formulas 
for a bounded exploration k and models (7, I,i), where 7 is the tuple of set of 
traces, IT is a trace assignment mapping (as defined in Section 2), and i € Zso 
that points to the position of traces. We will define different finite satisfaction 
relations for general models (for x = pes, opt, hpes, hopt): 


— fj, the common satisfaction relation among all semantics, 


— RP, called pessimistic semantics, 


t Mi ana ae z 

— —;?", called optimistic semantics, and 
i hopt — t : 

EP and =°”, variants of E? and -?”" for Kripke structures that encode 


termination of traces (modeled as self-loops to provide infinite traces). 


All these semantics coincide in the interpretation of quantifiers, Boolean connec- 
tives, and temporal operators up-to instant k— 1, but differ in their assumptions 
about unseen future events after the bound of observation k. 


Quantifiers. The satisfaction relation for the quantifiers is the following: 


0) KEY, (1) 
0) EE v. (2) 


Jr. y iff there isa t€ Tp: (T, Hr > t], 
Yr. y iff foral tE Tp, :(T,H[r > t], 


Boolean operators. For every i < k, we have: 


) FX true, 

) H% ax iff acH(r)(i), 

,t) HX ~ar if ag(r)(i), 

i) Hz pi Vy. iff (T, M, i) H} pi or (7, 0,1) H% y2, 
JER y Ave iff (T,IM,i) 


Fe ee ee ee 
NOD OK W 
a SS Na 


=r wy and (T, IT, i) =F we. 


Temporal connectives. The case where (i < k) is common between the opti- 
mistic and pessimistic semantics: 


(T, 11,1) H} OW if (7,7,i+1) RY, (8) 
(T, I,i) H} pU ve iff (7, 11,2) EE p2, or 
(T, I,i) H} pı and (T, ,i+1) H% vide, (9) 
(T, M, i) H Y Reve if (7,1,2) H% qe, and 

(T, II, i) =y pı or (T; I, i+ 1) Hý pi R yo. (10) 


For (i = k), in the pessimistic semantics the eventualities (including ©) are 
assumed to never be fulfilled in the future, so the current instant k is the last 
chance: 
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(T, I,i) H Oy iff never happens, (Pi) 
(T, II, i) = V1 uU we iff (T, II, i) reS pa, (P2) 
(T, II, i) eS V1 R We iff (T; II, i) =e V1 A Yo. (P3) 


On the other hand, in the optimistic semantics the eventualities are assumed to 
be fulfilled in the future: 


(T, I,i) =” Ow iff always happens, (01) 
(T; II, i) pr pı u we iff (Ts IT, i) Le pı V pa, (Oz) 
(T, Ht) H ta Reve if (T, Ht) Hg Ye. (Os) 


To capture the halting semantics, we use the predicate halt that is true 
if the state corresponds to a halting state (self-loop), and define halted = 
N\xVars(y) Laltz which holds whenever all traces have halted (and their final state 
will be repeated ad infinitum). Then, the halted semantics of the temporal case 
for i = k in the pessimistic case consider the halting case to infer the actual 


value of the temporal operators on the (now fully known) trace: 


(T, I, i) ERP! Ow iff ( ) Ki halted and (T, M, i) ER? y (HP) 
(T, I,i) HS Ud. iff ( ) ER“ be (HP2) 
(T, HT, i) ERPS p Ry iff (T, Hi) HPP pi A de, or 

( i) Kx halted and (T, IT, i) H}? Ya (HPs) 


Dually, in the halting optimistic case: 


(T, I,i) ERP Ow iff (7, IT, i) £% halted or (T, I,i) KR’ y (HO) 
(T, I,i) fopi pU ypo if (T, I,i) H” Yo, or 

(T, IT, i) KX halted and (T, T, i) ER??* y, (HO2) 
(T; II, i) a PR yp iff (T, II, i) ia p2 (HOs) 


Complete semantics. We are now ready to define the four semantics: 


— Pessimistic semantics: =P use rules (1)-(10) and (P;)-(P3). 
— Optimistic semantics: EP?" use rules (1)-(10) and (O,)-(O3). 
— Halting pessimistic semantics: E??* use rules (1)-(10) and (HP,)-(HP3). 
— Halting optimistic semantics: //??* use rules (1)-(10) and (HO,)-(HOs3). 


3.2 The Logical Relation between Different Semantics 


Observe that the pessimistic semantics is the semantics in the traditional BMC 
for LTL.In the pessimistic semantics a formula is declared false unless it is wit- 
nessed to be true within the bound explored. In other words, formulas can only 
get “truer” with more information obtained by a longer unrolling. Dually, the 
optimistic semantics considers a formula true unless there is evidence within the 
bounded exploration on the contrary. Therefore, formulas only get “falser” with 
further unrolling. For example, formula [p always evaluates to false in the pes- 
simistic semantics. In the optimistic semantics, it evaluates to true up-to bound 
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k if p holds in all states of the trace up-to and including k. However, if the for- 
mula evaluates to false at some point before k, then it evaluates to false for all 
j =k. The following lemma formalizes this intuition in HyperLTL. 


Lemma 1. Let k < j. Then, 
1. If (7 11,0) =. p, then (T, 11,0) =e” @. 
2. If (T, 1,0) KR?’ p, then (T, 1,0) 4°” o. 
3. If (T, 11,0) KR? p, then (T, 1,0) HE p 

) 


bi 
4. If (T, 11,0) ERP! p, then (T, 11,0) ER?! o 


In turn, the verdict obtained from the exploration up-to k can (in some cases) 
be used to infer the verdict of the model checking problem. As in classical BMC, 
if the pessimistic semantics find a model, then it is indeed a model. Dually, if 
our optimistic semantics fail to find a model, then there is no model. The next 
lemma formally captures this intuition. 


Lemma 2 (Infinite inference). The following hold for every k, 
LET 0) H" p, then (JL) = g. 
2. If (T, 1,0) p” ọ, then (T, 11,0) K y. 
3. If (T, 11,0) H?" p, then (T, 11,0) E g. 
4. If (T, H,0) propt p, then (T, 11,0) K ». 


Example 1. Consider the Kripke structure in Fig. 1, bound k = 3, and formula 
Yi = Yra-Irpg.((Pra L Dag) R gra): It is easy to see that instantiating 7, 
with trace 59515284 falsifies yı in the pessimistic semantics. By Lemma 2, this 
counterexample shows that the Kripke structure is a model of ~g; in the infinite 
semantics as well. That is, K 3° —y, and, hence, K j 71, so K 1. 

Consider again the same Kripke structure, bound k = 3, and formula p2 = 
Vra.drg.O(pr4 S qrp). To disprove p2, we need to find a trace 7,4 such that 
for all other 7g, proposition q in 7g always disagrees with p in ma. It is straight- 
forward to observe that such a trace 7,4 does not exist. By Lemma 2, proving 
the formula is not satisfiable up-to bound 3 in the optimistic semantics implies 
that K is not a model of ~ọ2 in the infinite semantics. That is, K 48”’ sy 
implies K j sy. Hence, we conclude K E yo. 

Consider again the same Kripke structure which has two terminating states, 
83 and s4, labeled by atomic proposition halt with only a self-loop. Let k = 3, 
and y3 = Vr4.dtp.(79,, U ap,,)- Instantiating m4 by trace $9183, which is of 
the form {p}* satisfies -y3. By Lemma 2, the fulfillment of formula implies that 
in infinite semantics it will be fulfilled as well. That is, K /4?** 
K H 793. Hence, K K 3. 

Consider again the same Kripke structure with halting states and formula 
p4 = Yra-Irg.COlpra Prp). A counterexample is an instantiation of TA 
such that for all 7g, both traces will always eventually agree on p. Trace 5915284, 
which is of the form {p}{p}{p}{q, halt}* with k = 3. This trace never agrees 
with a trace that ends in state s3 (which is of the form {p}”) and vice versa. By 
Lemma 2, the absence of counterexample up-to bound 3 in the halting optimistic 


73 implies 
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semantics implies that K is not a model of ~y, in the infinite semantics. That 
is, K [A2°P* sy, implies K j 7y4. Hence, we conclude K | a. 


4 Reducing BMC to QBF Solving 


Given a family of Kripke structures K, a HyperLTL formula y, and bound k > 0, 
our goal is to construct a QBF formula [K, y] whose satisfiability can be used 
to infer whether or not K = y. 

In the following paragraphs, we first describe how to encode the model and 
the formula, and then how to combine the two to generate the QBF query. We 
will illustrate the constructions using formula y; in Example 1 in Section 3, 
whose negation is Jr4.Vrg.cw with Y = (Pra © Prg) U qra: 


Encoding the models. The unrolling of the transition relation of a Kripke struc- 
ture Ka = (S, Sinz, ô, L) up to bound k is analogous to the BMC encoding for 
LTL [8]. First, note that the state space S can be encoded with a (logarithmic) 
number of bits in ||. We introduce additional variables ng,ni,... to encode 
the state of the Kripke structure and use AP* = AP U {no,71,...} for the ex- 
tended alphabet that includes the encoding of S. In this manner, the set of initial 
states of a Kripke structure is a Boolean formula over AP*. For example, for the 
Kripke structure K4 in Fig. 1 the set of initial states (in this case Sinit = {80}) 
corresponds to the following Boolean formula: 


Ia = ( mg Aan, A m2) A p Anq A ahalt 


assuming that (“no A an, A 72) represents state so (we need three bits to 
encode five states.) Similarly, R4 is a binary relation that encodes the transition 
relation ô of K 4 (representing the relation between a state and its successor). The 
encoding into QBF works by introducing fresh Boolean variables (a new copy of 
AP* for each Kripke structure K 4 and position), and then producing a Boolean 
formula that encodes the unrolling up-to k. We use x’, for the set of fresh copies 
of the variables AP* of K 4 corresponding to position i € [0, k]. Therefore, there 
are k|xa| = k|AP%,| Boolean variables to represent the unrolling of K4. We use 
Ia (x) for the Boolean formula (using variables from x) that encodes the initial 
states, and R4(x,x') (for two copies of the variables x and x’) for the Boolean 
formula whether x’ encodes a successor states of x. For example, for k = 3, we 
unroll the transition relation up-to 3 as follows, 


[Ka]s = Ia(@4) A RATA 24) A Rwy, 0) A Rah 24) 


which is the Boolean formula representing valid traces of length 4, using four 
copies of the variables AP% that represent the Kripke structure K4. 


Encoding the inner LTL formula. The idea of the construction of the inner LTL 
formula is analogous to standard BMC as well, except for the choice of differ- 
ent semantics described in Section 3. In particular, we introduce the following 


inductive construction and define four different unrollings for a given k: [-]?%°, 


Le D1, and PiK. 
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— Inductive Case: Since the semantics only differ on the temporal opera- 
tors at the end of the unrolling, the inductive case is common to all un- 
rollings and we use [-];,, to mean any of the choices of semantic (for * = 
pes, opt, hpes, hopt). For alli < k: 


Prli k = Pr , 
Prl; y = Pr 
pi VpAlie = Wili V ily 
pi Aplis = Tail, A oli, 


b U deli, = Wla V (foal A a U volta.) 


hi Roolin = Walia A (Bai V i R valia) 
Ov; = Wha 


Note that, for a given path variable 74, the atom pt a that results from 
[pr4];, is one of the Boolean variables in x‘. 

— For the base case, the formula generated is different depending on the 
intended semantics: 


Olea := false , ; Wer. := true , , 
es es apes opt opt opt 
alsa = [halted]. N^ llek eal are = [halted]. = Wee 


Note that the base case defines the value to be assumed for the formula after 
the end k of the unrolling, which is spawned in the temporal operators in 
the inductive case at k. The pessimistic semantics assume the formula to 
be false, and the optimistic semantics assume the formula to be true. The 
halting cases consider the case at which the traces have halted (using in this 
case the evaluation at k) and using the unhalting choice otherwise. 


Example 2. Consider again the formula =Y = (pr, © Prg) U qra. Using the 


pessimistic semantics [-~]§°3 with three steps is 


dp, V (o, = p) A (a, V (ol, Dr) A (è, V (PŽ, © PŽp) ^ @,)))). 


In this encoding, the collection x*,, contains all variables of AP* of K4 (that is 
{p}, d2- -}) connecting to the corresponding valuation for pr, in the trace 
of K4 at step 2 in the unrolling of K4. In other words, the formula [~y] uses 


variables from 7%, £4, xå, 25, and x}, xb, x4, x£% (that is, from z4 and TB). 


Combining the encodings. Now, let p be a HyperLTL formula of the form 
p = QATA-QBTB..... Qzrz.w and K = (K4, Kpg,..., Kz). Combining all the 
components, the encoding of the HyperLTL BMC problem in QBF is the follow- 
ing (for * = pes, opt, hpes, hopt): 


IK, pli = Qara.Qnmp--- Qzzz([KA]r oa [Ke] on [Kz] oz is) 


where [7] ;, is the choice of semantics, oj = A if Q; = 4, and oj = > if Q; = V, 
for j € Vars(y). 
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Example 3. Consider again Example 2. To combine the model description with 
the encoding of the HyperLTL formula, we use two identical copies of the given 
Kripke structure to represent different paths 74 and 7g on the model, denoted 
as K4 and Kg. The final resulting formula is: 


IK, >y]; := Sea.Vrg.([Ka]3 A (Kgl; > Fe] 63)) 


The sequence of assignments (mnz, 771, “No, p, 7g, halt)? (n2, N1, No, P, =q, 
anhalt)!  (=n2,n1, No, p, 7g, nhalt)? (n2, n1, no, =p, q, halt)? on Ka, 
corresponding to the path s9s15254, satisfies [-y]§‘3 for all traces on Kg. The 
satisfaction result shows that [K,-y]§° is true, indicating that a witness of vio- 
lation is found. Theorem 1, by a successful detection of a counterexample witness, 
and the use of the pessimistic semantics, allows to conclude that K j y. 


The main result of this section is Theorem 1 that connects the output of the 
solver to the original model checking problem. We first show an auxiliary lemma. 


Lemma 3. Let p be a closed HyperLTL formula and T = Traces(K) be an 
interpretation. For x = pes, opt, hpes, hopt, it holds that 


IK, v]i, is satisfiable if and only if (T, Ig, 0) Ez p. 


Proof (sketch). The proof proceeds in two steps. First, let 7 be the largest 
quantifier-free sub-formula of y. Then, every tuple of traces of length k (one 
for each 7) is in one-to-one correspondence with the collection of variables pt, 
that satisfies that the tuple is a model of ~ (in the choice semantics) if and 
only if the corresponding assignment makes |y]ġ. Then, the second part shows 
inductively in the stack of quantifiers that each subformula obtained by adding 
a quantifier is satisfiable if and only if the semantics hold. 


Lemma 3, together with Lemma 2, allows to infer the outcome of the model 
checking problem from satisfying (or unsatisfying) instances of QBF queries, 
summarized in the following theorem. 


Theorem 1. Let p be a HyperLTL formula. Then, 
1. For x = pes, hpes, if [K, 7]; is satisfiable, then K Fy. 
2. For x = opt, hopt, if [K,-]; is unsatisfiable, then K = y. 


Table 1 illustrates what Theorem 1 allows to soundly conclude from the 
output of the QBF solver about the model checking problem of formulas from 
Example 1 in Section 3. 


5 Evaluation and Case Studies 


We now evaluate our approach by a rich set of case studies on information-flow 
security, concurrent data structures, path planning for robots, and mutation 
testing. In this section, we will refer to each property in HyperLTL as in Table 2. 


Bounded Model Checking for Hyperproperties 105 


Semantics 
Formula ||Bound pessimistic optimistic halting 
k=2 UNSAT (inconclusive SAT (inconclusive) UNSAT (inconclusive) 
= k =3 || SAT (counterexample) SAT (inconclusive) UNSAT (inconclusive) 
f k=2 UNSAT (inconclusive SAT (inconclusive) UNSAT (inconclusive) 
Ka k=3 UNSAT (inconclusive UNSAT (proved) UNSAT (inconclusive) 
k=2 UNSAT (inconclusive UNSAT (inconclusive) non-halted (inconclusive) 
j k=3 UNSAT (inconclusive UNSAT (inconclusive) halted (counterexample) 
k=2 UNSAT (inconclusive UNSAT (inconclusive) non-halted (inconclusive) 
ra k=3 UNSAT (inconclusive UNSAT (inconclusive) halted (proved) 


Table 1: Comparison of Properties with Different Semantics 


We have implemented the technique described in Section 4 in our tool HyperQube. 
Given a transition relation, the tool automatically unfolds it up to k > 0 by a 
home-grown procedure written in Ocaml, called genqbf. Given the choice of the 
semantics (pessimistic, optimistic, and halting variants) the unfolded transition 
relation is combined with the QBF encoding of the input HyperLTL formula to 
form a complete QBF instance which is then fed to the QBF solver QuAbS [28]. 
All experiments in this section are run on an iMac desktop with Intel i7 CPU 
@3.4 GHz and 32 GB of RAM. A full description of the systems and formulas 
used can be accessed in the longer version of this paper [30]. 


Case Study 1: Symmetry in Lamport’s Bakery algorithm [12]. Symme- 
try states that no specific process has special privileges in terms of a faster access 
to the critical section (see different symmetry formulas in Table 2). In these for- 
mulas, each process P„ has a program counter denoted by pc(P,,), select indicates 
which process is selected to process next, pause if both processes are not selected, 
sym _ break is which process is selected after a tie, and sym(select,, , select, ) in- 
dicates if two traces are selecting two opposite processes. The Bakery algorithm 
does not satisfy symmetry (i.e. sym, ), because when two or more processes are 
trying to enter the critical section with the same ticket number, the algorithm al- 
ways gives priority to the process with the smaller process ID. HyperQube returns 
SAT using the pessimistic semantics, indicating that there exists a counterex- 
ample in the form of a falsifying witness to 74 in formula Ysym,. Table 3 includes 
our result on other symmetry formulas presented in Table 2. 


Case Study 2: Linearizability in SNARK [14]. SNARK implements a 
concurrent double-ended queue using double-compare-and-swap (DCAS) and a 
doubly linked-list that stores values in each node. Linearizability [29] requires 
that any history of execution of a concurrent data structure (i.e., sequence of 
invocation and response by different threads) matches some sequential order of 
invocations and responses (see formula yin in Table 2). SNARK is known to 
have two linearizability bugs and HyperQube returns SAT using the pessimistic 
semantics, identifying both bugs as two counterexamples. The bugs we identified 
are precisely the same as the ones reported in [14]. 
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Property Property in HyperLTL 


Ps1 = VraWrp.(—sym(selects ,, select, ) V 7(pause, , = pauser, ))R 
((pe(Po) x4 = pe(P1) mp) ^ (pe(Pi) m4 = pc(Po)xp)) 
Ps2 = Vra.Wrp.(—sym(selectx , , selectz,) V a(pause, , = pauser, ) V 
7(selectr. 4 < 3) V a(selectr, < 3))R 
((pe(Po) a4 — pe(P1) my) ^ (pe(Pi) «4 = pc(Po)xp)) 
Ps3 = VraWrp.(—sym(selects ,, select) V a(pause, , = pauser, ) V 
a(selectr 4, <3) V 7(selectr, <3) V 
asym(sym_ break, ,, sym_break,, , ))R 
((pe(Po) as = pe(P1) rz) ^ (pe(Pi)ra = pc(Po)x5)) 


Psym, = Yra -Ing.Osym(selectr ,, select, ) ^ (pauses , = pause...) A 
(pe(Po)ra = pe(Pi) rp) ^ (pe(Pi)za = pe(Po)zp) 


Psym, = Yra. Ing. Osym(selectr 4, select) \ (pause, , = pauser p) ^ 
(select, < 3) A (selectr, < 3) A 
(pe(Po)ra = pe(Pi) rp) ^ (pe(Pi)za = pe(Po) 5) 


Linearizability |) in = Vra.dre. D(history,, history...) 


Symmetry 


gn = Vradre.(PIN«, # PIN«,) A ((“haltz, V shaltx,) 
U ((haltx, A haltx,) A (Resultz, = Resultx,))) 


Prir = Sra.Vrp. (Omr,) \(ONRRa,) A (ONROz,) A 
Fairness ((ONactedctp tra  actry) > ((ONRRag) © (ONROn,g)))A 
actrs  actrp) > ((ONRRr_) + (ONROz,))) 


NI 


(( NacteActg 


Psp = Ina.Vre.(-goal, U goal, 
Path Planning i ( 2 a) 


Pro = Ina.Vrp. (strategy, p © strategy, ,)U (goal, , ^ goal...) 


Mutant Pmt = Ina. Vre(mutry Aamutr,) A (ine, © inzp) U (outra & Outrp)) 


Table 2: Hyperproperties investigated in case studies. 


Case Study 3: Non-interference in multi-threaded Programs. Non- 
interference [25] states that low-security variables are independent from the 
high-security variables, thus preserving secure information flow. We consider 
the concurrent program example in [32], where PIN is high security input and 
Result is low security output. HyperQube returns SAT in the halting pessimistic 
semantics, indicating that there is a trace that we can detect the difference of a 
high-variable by observing a low variable, that is, violating non-interference. We 
also verified the correctness of a fix to this algorithm, proposed in [32] as well. 
HyperQube uses the UNSAT results from the solver (with halting optimistic se- 
mantics) to infer the absence of violation, that is, verification of non-interference. 


Case Study 4: Fairness in non-repudiation protocols. A non-repudiation 
protocol ensures that a receiver obtains a receipt from the sender, called non- 
repudiation of origin (NRO), and the sender ends up having an evidence, named 
non-repudiation of receipt (NRR), through a trusted third party. A 
non-repudiation protocol is fair if both NRR and NRO are either received or not 
received by the parties (see formula Yair in Table 2). We verified two different 
protocols from [31], namely, Tincorrece that chooses not to send out NRR after 
receiving NRO, and a correct implementation Teorreet which is fair. For Teorrect 
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(respectively, Tincorrect), HyperQube returns UNSAT in the halting optimistic se- 
mantics (respectively, SAT in the halting pessimistic semantics), which indicates 
that the protocol satisfies (respectively, violates) fairness. 


Case Study 5: Path planning for robots. We have | a 
used HyperQube beyond verification, to synthesize strate- a | 
gies for robotic planning [34]. Here, we focus on produc- 
ing a strategy that satisfies two control requirements for a 
robot to reach a goal in a grid. First, the robot should take 
the shortest path (see formula Ysp in Table 2). Fig. 2 shows 
a 10 x 10 grid, where the red, green, and black cells are ini- 
tial, goal, and blocked cells, respectively. HyperQube returns 
SAT and the synthesized path is shown by the blue arrows. 
We also used HyperQube to solve the path robustness prob- 
lem, meaning that starting from an arbitrary initial state, 
a robot reaches the goal by following a single strategy (see 
formula Yr, in Table 2). Again, HyperQube returns SAT for 
the grid shown in Fig. 3. 


E 
1 E E 
E E 
Fig. 2: Shortest Path 


Case Study 6: Mutation testing. We adopted the Fig. 3: Robust path 
model from [15] and apply the original formula that de- 

scribes a good test mutant together with the model (see formula Ymut in Table 2). 
HyperQube returns SAT, indicating successful finding of a qualified mutant. We 
note that in [15] the authors were not able to generate test cases via Ymut, as 
the model checker MCHyper is not able to handle quantifier alternation in push- 
button fashion. 


Results and analysis. Table 3 summarizes our results including running times, 
the bounded semantics applied, the output of the QBF solver, and the resulting 
infinite inference conclusion using Theorem 1. As can be seen, our case studies 
range over model checking of different fragments of HyperLTL. It is important 
to note that HyperQube run time consists of generating a QBF formula by genqbf 
and then checking its satisfiability by QuAbS. It is remarkable that in some cases, 
QBF formula generation takes longer than checking its satisfiability. The models 
in our experiments also have different sizes. The most complex case study is 
arguably the SNARK algorithm, where we identify both bugs in the algorithm 
in 472 and 1497 seconds. In cases 5.1 — 6.2, we also demonstrate the ability of 
HyperQube to solve synthesis problems by leveraging the existential quantifier in 
a HyperLTL formula. 

Finally, we elaborate more on scalability of the path planning problem for 
robots. This problem was first studied in [34], where the authors reduce the 
problem to SMT solving using Z3 [13] and by eliminating the trace quantifiers 
through a combinatorial enumeration of conjunctions and disjunctions. Table 4 
compares our approach with the brute-force technique employed in [34] for differ- 
ent grid sizes. Our QBF-based approach clearly outperforms the solution in [34], 
in some cases by an order of magnitude. 
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# Model K [Formula bound k |[AP*| QBF semantics gengbf [s] QuAbS [s]|| Total [s] 
0.1} Bakery.3proc Ps1 7 27 SAT pes 0.44 0.04 0.48 |X 
0.2} Bakery.3proc PS2 12 27 SAT pes 1.31 0.15 1.46 |X 
0.3} Bakery.3proc PS3 20 27 UNSAT opt 2.86 4.87 7.73 V 
1.1| Bakery.3proc Ysym1 10 27 SAT pes 0.86 0.11 0.97 |X 
1.2| Bakery.3proc Psym2 10 27 SAT pes 0.76 0.17 0.93 |X 
1.3| Bakery.5proc Ysym1 10 45 SAT pes 23.57 1.08 24.65 |X 
1.4| Bakery.5proc Ysym2 10 45 SAT pes 29.92 1.43 31.35 |X 
2.1] SNARK-bugl | yin 26 160 SAT pes 88.42 383.60 || 472.02 |x 
2.2} SNARK-bug2 Plin 40 160 SAT pes 718.09 779.76 ||1497.85|X 
3.1/3- Thread incorrect| Yni 57 31 SAT h-pes 19.56 46.66 66.22 |X 
3.2| 3- Thread correct YNI 57 31 UNSAT h-opt 23.91 33.54 57.45 V 
4.1| NRP : Tincorrect |  Prair 15 15 SAT h-pes 0.10 0.27 0.37 |X 
4:2) NRP : Teorrect Pfair 15 15 UNSAT  h-opt 0.08 0.12 0.20 v 
5.1| Shortest Path a 
n 
Initial State (see Table 4) a 
5.2 = 
Robustness = 
Nn 
6.1 Mutant Pmut 8 6 SAT h-pes 1.40 0.35 1.75 
Table 3: Performance of HyperQube, where column case# identifies the artifact, v 
denotes satisfaction, and X denotes violation of the formula. AP* is the set of Boolean 
variables encoding K. 
HyperQube [34] 
Formula|grid size|bound k||AP*| genqbf [s| QuAbS [s]|| Total [s]|/gensmt |s] Z3 [s] || Total[s] 
107 20 12 1.30 0.57 1.87 8.31 0.33 8.64 
207 40 14 4.53 12.16 16.69 124.66 6.41 || 131.06 
Psp 407 80 16 36.04 35.75 71.79 || 1093.12 72.99 ||1166.11 
607 120 16 105.82 120.84 226.66 4360.75 532.11||4892.86 
107 20 12 1.40 0.35 1.75 11.14 0.45 11.59 
Prb 207 40 14 15.92 15.32 31.14 49.59 2.67 52.26 
40? 80 16 63.16 20.13 83.29 216.16 19.81 || 235.97 


Table 4: Path planning for robots and comparison to [34]. All cases use the halting 
pessimistic semantics and QBF solver returns SAT, meaning successful path synthesis. 


6 Related Work 


There has been a lot of recent progress in automatically verifying [12,22—24] and 
monitoring [1,6,7,20,21, 26,33] HyperLTL specifications. HyperLTL is also sup- 
ported by a growing set of tools, including the model checker MCHyper [12,24], the 
satisfiability checkers EAHyper [19] and MGHyper [17], and the runtime monitor- 
ing tool RVHyper [20]. The complexity of model checking for HyperLTL for tree- 
shaped, acyclic, and general graphs was rigorously investigated in [2]. The first 
algorithms for model checking HyperLTL and HyperCTL* using alternating au- 
tomata were introduced in [24]. These techniques, however, were not able to deal 


Bounded Model Checking for Hyperproperties 109 


in practice with alternating HyperLTL formulas in a fully automated fashion. 
We also note that previous approaches that reduce model checking HyperLTL— 
typically of formulas without quantifier alternations—to model checking LTL 
can use BMC in the LTL model checking phase. However, this is a different 
approach than the one presented here, as these approaches simply instruct the 
model checker to use a BMC after the problem has been fully reduced to an 
LTL model checking problem while we avoid this translation. These algorithms 
were then extended to deal with hyperliveness and alternating formulas in [12] 
by finding a winning strategy in VJ games. In this paper, we take an alterna- 
tive approach by reducing the model checking problem to QBF solving, which 
is arguably more effective for finding bugs (in case a finite witness exists). 


The satisfiability problem for HyperLTL is shown to be undecidable in general 
but decidable for the 4*V* fragment and for any fragment that includes a VA 
quantifier alternation [16]. The hierarchy of hyperlogics beyond HyperLTL were 
studied in [11]. The synthesis problem for HyperLTL has been studied in [3] in 
the form of program repair, in [4] in the form of controller synthesis, and in [18] 
for the general case. 


7 Conclusion and Future Work 


We introduced the first bounded model checking (BMC) technique for verifi- 
cation of hyperproperties expressed in HyperLTL. To this end, we proposed 
four different semantics that ensure the soundness of inferring the outcome of 
the model checking problem. To handle trace quantification in HyperLTL, we re- 
duced the BMC problem to checking satisfiability of quantified Boolean formulas 
(QBF). This is analogous to the reduction of BMC for LTL to the simple propo- 
sitional satisfiability problem. We have introduced different classes of semantics, 
beyond the pessimistic semantics common in LTL model checking, namely op- 
timistic semantics that allow to infer full verification by observing only a finite 
prefix and halting variations of these semantics that additionally exploit the ter- 
mination of the execution, when available. Through a rich set of case studies, we 
demonstrated the effectiveness and efficiency of our approach in verification of 
information-flow properties, linearizability in concurrent data structures, path 
planning in robotics, and fairness in non-repudiation protocols. 


As for future work, our first step is to solve the loop condition problem. This 
is necessary to establish completeness conditions for BMC and can help cover 
even more examples efficiently. The application of QBF-based techniques in the 
framework of abstraction/refinement is another unexplored area. Success of BMC 
for hyperproperties inherently depends on effectiveness of QBF solvers. Even 
though QBF solving is not as mature as SAT/SMT solving techniques, recent 
breakthroughs on QBF have enabled the construction of our tool HyperQube, and 
more progress in QBF solving will improve its efficiency. 
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Abstract. We develop a framework for model checking infinite-state 
systems by automatically augmenting them with auxiliary variables, en- 
abling quantifier-free induction proofs for systems that would other- 
wise require quantified invariants. We combine this mechanism with a 
counterexample-guided abstraction refinement scheme for the theory of 
arrays. Our framework can thus, in many cases, reduce inductive rea- 
soning with quantifiers and arrays to quantifier-free and array-free rea- 
soning. We evaluate the approach on a wide set of benchmarks from the 
literature. The results show that our implementation often outperforms 
state-of-the-art tools, demonstrating its practical potential. 


1 Introduction 


Model checking is a widely-used and highly-effective technique for automated 
property checking. While model checking finite-state systems is a well-established 
technique for hardware and software systems, model checking infinite-state sys- 
tems is more challenging. One challenge, for example, is that proving properties 
by induction over infinite-state systems often requires the use of universally 
quantified invariants. While some automated reasoning tools can reason about 
quantified formulas, such reasoning is typically not very robust. Furthermore, 
just discovering these quantified invariants remains very challenging. 

Previous work (e.g., [52]) has shown that prophecy variables can some- 
times play the same role as universally quantified variables, making it possible 
to transform a system that would require quantified reasoning into one that 
does not. However, to the best of our knowledge, there has been no automatic 
method for applying such transformations. In this paper, we introduce a tech- 
nique we call countererample-guided prophecy. During the refinement step of an 
abstraction-refinement loop, our technique automatically introduces prophecy 
variables, which both help with the refinement step and may also reduce the 
need for quantified reasoning. We demonstrate the technique in the context of 
model checking for infinite-state systems with arrays, a domain which is known 
for requiring quantified reasoning. We show how a standard abstraction for arrays 
can be augmented with counterexample-guided prophecy to obtain an algorithm 
that reduces the model checking problem to quantifier-free, array-free reasoning. 
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The paper makes the following contributions: i) we introduce an algorithm 
called Prophecize which uses history and prophecy variables to target a spe- 
cific term at a specific time step of an execution, producing a new transition 
system that can effectively reason universally about that term; ii) we develop 
an automatic abstraction-refinement procedure for arrays, which leverages the 
Prophecize algorithm during the refinement step, and show that it is sound 
and produces no false positives; iii) we develop a prototype implementation of 
our technique; and iv) we evaluate our technique on four sets of model checking 
benchmarks containing arrays and show that our implementation outperforms 
state-of-the-art tools on a majority of the benchmark sets. 


2 Background 


We assume the standard many-sorted first-order logical setting with the usual 
notions of signature, term, formula, and interpretation. A theory is a pair T = 
(2’,I) where X is a signature and I is a class of interpretations, the models of 
T. A X-formula Y is satisfiable (resp., unsatisfiable) in T if it is satisfied by some 
(resp., no) interpretation in I. Given an interpretation M, a variable assignment 
s over a set of variables X is a mapping that assigns each variable x € X of sort 
o to an element of o™”, denoted z. We write M|[s] for the interpretation that 
is equivalent to M except that each variable x € X is mapped to zê. Let x be 
a variable, t a term, and ¢ a formula. We denote with ġ{x +> t} the formula 
obtained by replacing every free occurrence of x in ¢@ with t. We extend this 
notation to sets of variables and terms in the usual way. If f and g are two 
functions, we write fog to mean functional composition, i.e., fog(x) = f(g(x)). 
Let T4 be the standard theory of arrays [50] with extensionality, extended 
with constant arrays. Concretely, we assume sorts for arrays, indices, and ele- 
ments, and function symbols read, write, and constarr. Here and below, we use 
a and b to refer to arrays, i and j to refer to array indices, and e and c to refer 
to array elements, where c is also restricted to be an interpreted constant. The 
theory contains the class of all interpretations satisfying the following axioms: 


Va,i,j,e.i =j => read(write(a,j,e),i)=eA 


it 
iA#Aj => read(write(a, j, e), i) = read(a, i) ete) 
Va, b. (Vi. read(a, i) = read(b,i)) => a=b (ext) 
Vi. read(constarr(c), i) =c (const) 


Symbolic Transition Systems and Model Checking. For generality, as- 
sume a background theory 7 with signature X. We will assume that all terms 
and formulas are X-terms and »/-formulas, that entailment is entailment mod- 
ulo 7, and interpretations are 7-interpretations. A symbolic transition system 
(STS) S is a tuple S := (X,I,T), where X is a finite set of state variables, [(X) 
is a formula denoting the initial states of the system, and T(X, X’) is a formula 
expressing a transition relation. Here, X’ is the set obtained by replacing each 
variable x € X with a new variable x’ of the same sort. Let prime(x) = x’ be the 
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bijection corresponding to this replacement. We say that a variable x is frozen 
if T = x2’ = x. When the state variables are obvious, we will often drop X. 

A state s of S is a variable assignment over X. An execution of S of length 
k is a pair (M,7), where M is an interpretation and m := sọ, 51,...,5,-1 isa 
path of length k, a sequence of states such that M[so] H I(X) and M[s;][si41 ° 
prime—'] = T(X, X’) for all 0 < i < k—1. When reasoning about paths, it is 
often convenient to have multiple copies of the state variables X. We use X @n to 
denote the set of variables obtained by replacing each variable x € X with a new 
variable called z@n of the same sort. We refer to these as timed variables. A state 
s is reachable in S if it appears in a path of some execution of S. We say that a 
formula P(X) is an invariant of S, denoted by S = P(X), if P(X) is satisfied 
in every reachable state of S (i.e., for every execution (M,7), M[s] P(X) for 
each s in 7). The invariant checking problem is, given S and P(X), to determine 
if S = P(X). A counterexample is an execution (M, r) of S of length k such that 
Mlsx—1] E P(X). If I(X) H o(X) and $(X) A T(X, X’) |H 4(X"), then 4(X) 
is an inductive invariant. Every inductive invariant is an invariant (by induction 
over path length). In this paper we focus on model checking problems where J, 
T and P are quantifier-free. However, a quantified inductive invariant might still 
be necessary to prove a property of the system. 

Bounded Model Checking (BMC) is a bug-finding technique which attempts 
to find a counterexample for a property, P(X), of length k for some finite k [9]. A 
single BMC query at bound k for an invariant property uses a constraint solver 
to check the satisfiability of the following formula: BMC(S, P, k) := I(X@0) A 
A T(X @i, X@(i+1))) A7>P(X@k). If the query is satisfiable, there is a bug. 


Counterexample-Guided Abstraction Refinement (CEGAR). CEGAR 
is a general technique in which a difficult conjecture is tackled iteratively [44]. 
Algorithm 1 shows a simple CEGAR loop for checking an invariant P for an STS 
S. It is parameterized by three functions. The Abstract function produces an 
initial abstraction of the problem. It must satisfy the contract that if (S, P) = 
Abstract(S, P), then S = P — SEP. The next function is the Prove 
function. This can be any (unbounded) model-checking algorithm that can return 
counterexamples. It checks whether a given property P is an invariant of a 
given STS S. If it is, it returns with proven set to true. Otherwise, it returns a 
bound k at which a counterexample exists. The final function is Refine. It takes 
the abstracted STS and property together with a bound & at which a known 
counterexample for the abstract STS exists. Its job is to refine the abstraction 
until there is no longer a counterexample of size k. If it succeeds, it returns the 
new STS and property. It fails if there is an actual counterexample of size k for 
the concrete system. In this case, it sets the return value refined to false. 


Auxiliary variables. We finish this section with relevant background on auzil- 
iary variables, a crucial part of the refinement step described in Sec. 4. Auxiliary 
variables are new variables added to the system which do not influence its be- 
havior (i.e., the reduct to the old set of variables of any reachable state in the 
new system is a reachable state in the old system), but may assist in proofs. 
There are two main categories of auxiliary variables we consider: history and 
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Algorithm 1 STS-CEGAR(S := (X,/,T), P) 


1: ((X,1,T), P) + Abstract(S, P) 
2: while true do 


3 (k, proven) + Prove((X, T, TY, P) // try to prove 
4 if proven then return true // property proved 
5:  ((X,1,T), P, refined) ~ Refine((X,1,T), P, k) // try to refine 
6 if srefined then return false // found counterexample 
7: end while 


prophecy. History variables, also known as ghost state, preserve a value, mak- 
ing its past value available in future states. Prophecy variables are the dual of 
history variables and provide a way to refer to a value that occurs in a future 
state. Abadi and Lamport formally characterized soundness conditions for the 
introduction of history and prophecy variables [1]. Here, we consider a simple, 
structured form of history variables. 


Definition 1. Let S = (X,I,T) be an STS, t a term whose free variables 
are in X, and n > 0, then Delay(S,t,n) returns a new STS and variable 
((X*, Th Th) AP), where X?” = Xw {h}... ht}, I” = I, and T” =TU{h?' = 
t} U Uath = hy 7}. 

The Delay operator makes the current value of a term t available for the next 
n states in a path. This is accomplished by adding n new history variables and 
creating an assignment chain that passes the value to the next history variable 
at each state. Thus, h? contains the value that t had k states ago. The initial 
value of each history variable is unconstrained. 


Theorem 1. Let S = (X,I,T) be an STS, P a property, and Delay(S,v,n) = 
(S, h?). Then S HP iff S HP. 


We refer to [1] for a general proof which subsumes Theorem 1. In contrast to the 
general approach for history variables, we use a version of prophecy that only 
requires a single frozen variable. The motivation for this is that a frozen variable 
can be used in place of a universal quantifier, as the following theorem adapted 
from [52] shows. 


Theorem 2. Let S = (X,I,T) be an STS, x a variable in formula P(X), and 
v a fresh variable (i.e., not in X or X'). Let SP = (X U {v}, I, T U {v' = v}). 
Then S Yz. P(X) iff SP | P(X){x v}. 


Theorem 2 shows that a universally quantified variable in an invariant can be 
replaced with a fresh symbol in a process similar to skolemization. The intuition 
is as follows. The frozen variable has the same value in all states, but it is 
uninitialized by J. Thus, for each path in S, there is a corresponding path (i.e., 
identical except at v) in SP for every possible value of v. This proliferation of 
paths plays the same role as the quantified variable in P. We mention here one 
more theorem from [52]. This one allows us to introduce a universal quantifier. 
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Algorithm 2 Prophecize((X,/,T), P(X),t,n) 


if n = 0 then 
return ((X W {pr}, I, T U {pt = pe}), pe =t => P(X), pe) 
else 


((X*, I", T”), hf’) = Delay((X,1,T),t,n) 
return ((X" w{p?},1,TU {p? =p} }) pm = hf => P(X), p?) 
end if 


Theorem 3. Let S = (X,I,T) be an STS, P(X) a formula, and t a term. 
Then, S = P(X) iff S EVy(y=t => P(X)), where y is not free in P(X). 


Theorems 2 and 3 are special cases of Theorems 3 and 4 of [52]. The original 
theorems handle the more general case where P(X) can be a temporal formula. 


3 Using Auxiliary Variables to Assist Induction 


We can use Theorem 3 followed by Theorem 2 to introduce frozen prophecy 
variables that predict the value of a term t when the property P is being checked. 
We refer to t as the prophecy target and the process as universal prophecy. If we 
also use Delay, we can target a term at some finite number of steps before the 
property is checked. This is captured by Algorithm 2, which takes a transition 
system, property P(X), term t, and n > 0. If n = 0, it introduces a universal 
prophecy variable for t. Otherwise, it first introduces history variables for t and 
then applies universal prophecy to the delayed t. In either case it returns the 
augmented system, augmented property, and the prophecy variable. 

We will use the STS shown in Fig. l(a) as a running example throughout 
the paper (it is inspired by the hardware example from [10]). We assume the 
background theory 7 includes integer arithmetic and arrays of integers indexed 
by integers. The variables in this STS include an array and four integer variables, 
representing the read index, write index, read data, and write data, respectively. 
The system starts with an array of all zeros. At every step, if the write data is 
less than 200, it writes that data to the array at the write index. Otherwise, the 
array stays the same. Additionally, the read data is updated with the current 
value of a at ip. This effectively introduces a one-step delay between when the 
value is read from a and when the value is present in d,. The property is that 
d, < 200. This property is clearly true, but it is not straightforward to prove 
with standard model checking techniques because it is not inductive. Note that 
it is also not k-inductive for any k [59]. The primary issue is that it does not 
constrain the value of a at all, so in an inductive proof, the value of a could be 
anything in the induction hypothesis. 

One way to prove the property is to strengthen it with the quantified invari- 
ant: Vi. read(a, i) < 200. Remarkably, observe that by augmenting the system 
using Prophecize, it is possible to prove the property using only a quantifier- 
free invariant. In this case, the relevant prophecy target is the value of i, one 
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I = a= constarr(0) A dr < 200 
T =a’ = ite(dw < 200, 


write(a, tw, dw), a)A 


I := a= constarr(0) A dr < 200 
T =a’ = ite(dy < 200, 


write(a, tw, dw),a)A 


d! = read(a, ip) di, =read(a, ir) A pi, = pi, Ahi =i 
P= d, < 200 PH p, =h} = d, < 200 
(a) (b) 


Fig. 1: (a) Running example. (b) Running example with prophecy variable. 


step before checking the property. We run Prophecize((X,1,T), P,i,,1) and it 
returns the system and property shown in Fig. 1(b), along with the prophecy 
variable p}. This augmented system has a simple, quantifier-free invariant which 
can be used to strengthen the property, making it inductive: read(a, p;,) < 200. 
This formula holds in the initial state because of the constant array, and if we 
start in a state where it holds, it still holds after a transition. 

Notice that the invariant learned over the prophecy variable has the same 
form as the original quantified invariant. However, we have instantiated that uni- 
versal quantifier with a fresh, frozen prophecy variable. Intuitively, the prophecy 
variable captures a proof by contradiction: assume the property does not hold, 
consider the value of i, one step before the first failure of the property, and then 
use this value to show the property holds. This example shows that auxiliary 
variables can be used to transform an STS without a quantifier-free inductive 
invariant into an STS with one. However, it is not yet clear how to identify good 
targets for history and prophecy variables. In the next section, we show how this 
can be done as part of an abstraction refinement scheme for symbolic transition 
systems over the theory of arrays. 


4 Abstraction Refinement for Arrays 


We now introduce our main contribution. Given a background theory 7g and 
a model checking algorithm for STS’s over 7g, we use an instantiation of the 
CEGAR loop in Algorithm 1 to check properties of STS’s over the theory that 
combines 7g and the theory of arrays, 74. The key idea is to abstract all array 
operators and then add array lemmas as needed during refinement. 


Abstract and Prove. We use a standard abstraction for the theory of arrays, 
which we denote Abstract-Arrays. Every array sort is replaced with an unin- 
terpreted sort, and the array variables are abstracted accordingly. Each constant 
array is replaced by a fresh abstract array variable, which is then constrained to 
be frozen (because constant arrays do not change over time). Additionally, we 
replace the read and write array operations with uninterpreted functions. Note 
that if the system contains multiple array sorts, we need to introduce a separate 
read and write function for each uninterpreted abstract array sort. Using unin- 
terpreted sorts and functions for abstracting arrays is a common technique in 
Satisfiability Modulo Theories [7] (SMT) solvers [32]. Intuitively, our initial ab- 
straction starts with memoryless arrays. We then incrementally refine the arrays’ 
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T := @ = constarr0 ^ dy < 200 

T = @' = ite(dw < 200, write(â, iw, dw), @)A 
dy = read (â, ip) A constarr0 = constarr0 

P = d, < 200 


Fig. 2: Result of calling Abstract on the example from Fig. 1(a) 


memory as needed. Fig. 2 shows the result of running Abstract-Arrays on the 
example from Fig. 1(a). Prove can be instantiated with any (unbounded) model 
checker that can accept expressions over the background theory 7g combined 
with the theory of uninterpreted functions. In particular, due to our abstraction, 
the model checker does not need to support the theory of arrays. 


Refine. Here, we explain the refinement approach for our array abstraction. At 
a high level, we solve a BMC problem over the abstract STS at bound k. We 
then look for violations of array axioms in the returned counterexample, and 
instantiate each violated axiom (this is essentially the same as the lazy array 
axiom instantiation approach used in SMT solvers [13,14,17,27]). We then lift 
these axioms to the STS-level by modifying the STS. It is this step that may 
require introducing auxiliary variables. The details are shown in Algorithm 3. 

We start by computing a set Z of index terms with ComputelIndices — this 
set is used in the lazy axiom instantiation step below. We add to Z every 
term that appears in a read or write operation in BM Os, P, k). We also 
add a witness index for every array equality - the witness corresponds to a 
skolemized existential variable in the contrapositive of axiom (ext). For sound- 
ness, we must add an extra variable Às for each index sort o and constrain 
it to be different from all the other index variables of the same sort (this is 
based on the approach in [13]). Intuitively, this variable represents an arbi- 
trary index different from those mentioned in the STS. We assume that the 
index sorts are from an infinite domain so that a distinct element is guaran- 
teed. For simplicity of presentation, we also assume from now on that there 
is only a single index sort (e.g. integers). Otherwise, Z must be partitioned 
by sort. For the abstract STS in Fig. 2, with k = 1, the index set would be 
T = {i,@0, iw @0, w000, w100, ÀA mt@0, ir@1, ty, @1, w001, w101, Am:@1}, where 
wo and w are witness indices. 

After computing indices, the algorithm enters the main loop. We first check 
the BMC(S, P, k) query. The result p is either a counterexample, or the dis- 
tinguished value L, indicating that the query is unsatisfiable. If it is the latter, 
then we return the refined STS and property, as the property now holds on the 
STS up to bound k. Otherwise, we continue. The next step (line 5) is to find 
violations of array axioms in the execution p based on the index set Z. 

CheckArrayAxioms takes two arguments, a counterexample and an index set, 
and returns instantiated array axioms that do not hold over the counterexample. 
This works as follows. We first look for occurrences of write in the BMC formula. 
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Algorithm 3 Refine-Arrays (S = ix, T), P, k) 
1: De ComputeIndices(S, P,k) 
2: loop S 
3: p< BMC(S,P,k) 
: if p= 1 then return ((X,/,T),P, true) // Property holds up to bound k 


4 
5: (ca, nca) + CheckArrayAxioms(p,Z) 

6: if ca =A nca = then return ((X,1,T), P, false) // True counterexample 
7: // Go through non-consecutive array axiom instantiations 

8 for (ax,i@n;) € nca do 


9: let Mmin “= min(r(ax)\{ni}) 
10: (XP, IP, TP), PP, pTi) 4 Prophecize((X, T, TY; P,i,k— n;) 
11: arte + ax{iQn; +> pe Onmin} 
12: ca + ca {axr-Q@nmin} // add consecutive version of axiom 
13: T+ TW {p} @0,...,p-"* Qk} 
14: X4 X’: T4 IP:T TP; P4 PP 
15: end for 
16: // Go through consecutive array axiom instantiations 
17: for ax € ca do 
18: let nmin = min(T(ax)), Nmax = maz(T(ax)) 
19: assert(Nmaz = Nmin V Mmax = Nmin + 1) 
20: if k = 0 then 
21: I 4 TAar{X@nmin > X} 
22: else if nmin = mac then 
23: TET A ax{X@nmin > X} A at{X@nmin > X'} 
24: else 
25: Te TPA at{X@nmin > X}{XQ@(nmin +1) > X"} 
26: end if 
27: end for 
28: end loop 


For each such occurrence, we instantiate the (write) axiom so that the write 
term in the axiom matches the term in the formula (i.e., we use the write term 
as a trigger). This instantiates all quantified variables except for i. We then 
instantiate 7 once for each variable in the index set. We evaluate each of the 
instantiated axioms using the values from the counterexample and keep those 
instantiations that reduce to false. We do the same thing for the (const) axiom, 
using each constant array term in the BMC formula as a trigger. Finally, for each 
array equality a@@m = b@n in the BMC formula, we check an instantiation of the 
contrapositive of (ext): a@m 4 b@n —> read(a@m, w;@n) F read(b@n, w;,Qn). 
We add instantiated formulas that do not hold in p to the set of violated axioms. 


CheckArrayAzioms sorts the collected axiom instantiations into two sets 
based on which timed variables they contain. The consecutive set contains for- 
mulas with timed variables whose timing differs by at most one; whereas the 
timed variables in the formulas contained in the non-consecutive set may differ 
by more. Formally, let 7 be a function which takes a single timed variable and 
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returns its time (e.g., 7(i@2) = 2). We lift this to formulas by having 7(@) re- 
turn the set of all time-steps for variables in ¢. A formula ¢ is consecutive iff 
maxz(t(@)) — min(r(@)) < 1. Note that instantiations of (ext) are consecutive by 
construction. Additionally, because constant arrays have the same value in all 
time steps, we can always choose a representative time step for instantiations of 
(const) that results in a consecutive formula. However, instantiations of (write) 
may be non-consecutive, because the variable from the index set may be from 
a time step that is different from that of the trigger term. CheckArrayAxioms 
returns the pair (ca, nca), where ca is a set of consecutive axiom instantiations 
and nca is a set of pairs — each of which contains a non-consecutive axiom in- 
stantiation and the index-set variable that was used to create that instantiation. 

At line 6, we check if the returned sets are empty. If so, then there are no array 
axiom violations and p is a concrete counterexample. In this case, the system, 
property, and false are returned. Otherwise, we process the two sets. In lines 
8-15, we process the non-consecutive formulas. Given a non-consecutive formula 
ax together with its index-set variable iQ@n,, we first compute the minimum time- 
step of the axiom’s other variables, nmin. We then use the Prophecize method 
to create a prophecy variable pi, that is effectively a way to refer to i@Qn; at 
time-step Nmin (line 10). This allows us to create a consecutive formula az, that 
is semantically equivalent to ax (line 11). This new consecutive formula is added 
to ca in line 12, and in line 13 the introduced prophecy variables (one for each 
time-step) are added to the index set. Then, line 14 updates the abstraction. 

At line 17, we are left with a set of consecutive formulas to process. For each 
consecutive formula ax, we compute the minimum and maximum time-step of 
its variables (line 18), which must differ by no more than 1 (line 19). There are 
three cases to consider: i) when k = 0, the counterexample consists of only the 
initial state-we thus refine the initial state by adding the untimed version of ax 
to I (line 21); ii) if ax contains only variables from a single time step, then we 
add the untimed version of ax as a constraint for both X and X’, ensuring that 
it will hold in every state (line 23); iii) finally, if ax contains variables from two 
adjacent time_steps, we can translate this directly into a transition formula to 
be added to T (line 25). The loop then repeats with the newly refined STS. 


Example. Consider again the example from Fig. 2, and suppose Refine-Arrays 
is called on S and P with k = 3. At this unrolling, one possible abstract coun- 
terexample violates the following nonconsecutive axiom instantiation: 


(i,-@2 = iy@0 =} read(write(a@0, in @0, dy @0), i-@2) = dy@O) A 

(i,@2 F i,@0 => read(write(a@0, i @0, dy @0), i-@2) = read(@@0, i-@2)) 
Calling Prophecize(S, P, i», 1) returns the new STS ((Xw{hd Peds Lh he _ 
ir Apt. = p} ) and the new property p; = h} => d, < 200. The history variable 
hi makes the previous value of i, available at each time-step, and the prophecy 
variable p; mimics a universally quantified variable. We substitute p} @0 for 


i-@2 to obtain a consecutive formula. Its untimed version (and a primed version) 
is added to the transition relation. 
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We stress that processing nonconsecutive axioms using Prophecize is how 
we automatically discover the universal prophecy variable Pin and it is exactly 
the universal prophecy variable that was needed in Sec. 3 to prove correctness of 
the running example. An alternative approach could avoid nonconsecutive ax- 
ioms using Craig interpolants [26] so that only consecutive axioms are found [15]. 
However, quantifier-free interpolants are not guaranteed to exist for the standard 
theory of arrays, and the auxiliary variables found using nonconsecutive axioms 
are needed to improve the chances of finding a quantifier-free inductive invariant. 

It is important to have enough prophecy variables to assist in constructing 
inductive invariants. We found that we could often obtain a larger, richer set of 
prophecy variables by weakening our array abstraction. We do this by replacing 
equality between arrays by an uninterpreted predicate, and also checking the con- 
gruence axiom, the converse of (ext). Since more axioms are checked, there are 
more opportunities to introduce auxiliary variables. We call this weak abstrac- 
tion (WA) as opposed to strong abstraction (SA), which uses regular equality 
between abstract arrays and guarantees congruence through UF axioms. 

On the other hand, an excessive number of unnecessary auxiliary variables 
could overwhelm the Prove step. Thus, an improvement not shown in Algorithm 
3 is to check consecutive axioms first and only add nonconsecutive ones when 
necessary. This is the motivation behind the custom array solver implementation 
CheckArrayAxioms based on [13]. In principle, we could have used an SMT solver 
to find array axioms, but it would give no preference to consecutive axioms. Sim- 
ilarly, we could overwhelm the algorithm with unnecessary consecutive axioms. 
CheckArrayAzioms can still produce hundreds or even thousands of (consecu- 
tive) axiom instantiations. Once these are lifted to the transition system, some 
may be redundant. To mitigate this issue, when the BMC check returns L and 
we are about to return (line 4), we keep only axioms that appear in the unsat 
core of the BMC formula [22]. 


Correctness. We now state two important correctness theorems. Note that here 
and below, proofs are omitted due to space constraints. An extended version with 
proofs is available at: https://arxiv.org/abs/2101.06825. 


Theorem 4. Algorithm 1, instantiated with Abstract-Arrays, a model-check- 
er Prove as described above, and Refine-Arrays is sound. 


Theorem 5. If Algorithm 1, instantiated with Abstract-Arrays, Prove as 
described above, and Refine-Arrays, returns false, there is a concrete coun- 
terexample of length k in the concrete transition system. 


5  Expressiveness and Limitations 


We now address the expressiveness of counterexample-guided prophecy with 
regard to the introduction of auxiliary variables. For simplicity, we ignore the 
array abstraction, relying on the correctness theorems. An inductive invariant 
using auxiliary variables can be converted to one without auxiliary variables 
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by first universally quantifying over the prophecy variables, then existentially 
quantifying over the history variables. The details are captured by this theorem: 


Theorem 6. Let S := (X,I,T) be an STS, and P(X) be a property such 
that S | P(X). Let H be the set of history variables, and P be the set of 
prophecy variables introduced by Refine-Arrays. Let S := (X UH UP,I,T) 
and P := (Apep P = t(p)) => P(X) be the system and property with auzil- 


iary variables. The function t maps prophecy variables to their target term from 
Prophecize. If Inv(X, H, P) is an inductive invariant for S and entails P, then 


JHYVPInw(X, H,P) is an inductive invariant for S and entails P, where 3H and 
YP bind each variable in the set with the corresponding quantifier. 


Although the invariants found using counterexample-guided prophecy corre- 
spond to JV invariants over the unmodified system, we must acknowledge that 
the existential power is very weak. The existential quantifier is only used to re- 
move history variables. While history variables can certainly be employed for 
existential power in an invariant [55], these specific history variables are intro- 
duced solely to target a term for prophecy and only save a term for some fixed, 
finite number of steps. Thus, we do not expect to gain much existential power in 
finding invariants on practical problems. This use of history and prophecy vari- 
ables can be thought of as quantifier instantiation at the model checking level, 
where the instantiation semantically uses a term appearing in an execution of 
the system. Consequently, our technique performs well on systems where there is 
only a small number of instantiations needed over terms that are not too distant 
in time from a potential property violation that must be disproved (i.e., not 
many history variables are required). This appears to be a common situation for 
invariant-finding benchmarks, as we show empirically in Sec. 6. 


Limitations. If our CEGAR loop terminates, it either terminates with a proof or 
with a true counterexample. However, it is possible that the procedure may not 
terminate. In particular, while we can always refine the abstraction for a given 
bound k, there is no guarantee that this will eventually result in a refinement 
that rules out all spurious counterexamples (of any length). 

This failure mode occurs, for instance, when no finite number of instantiations 
can capture all the relevant indices of the array. Consider an example system 
with I := a = constarr(0), T = a! = write(a, ig, read(a,i1) +1), and P := 
read(a,t,) > 0. The array a is initialized with 0 at every index, and at every 
step, a is updated at a single index by reading from an arbitrary index of a and 
adding 1 to the result. Note that the index variables are unconstrained: they 
can range over the integers freely at each time step. Then, the property is that 
every element of a is positive. This property clearly holds because of a quantified 
invariant maintained by the system: Vi . read(a, i) > 0. 

However, the initial abstraction is a memoryless array which can easily vi- 
olate the property by returning negative values from reads. Since the array is 
updated in each step at an arbitrary index based on a read from another arbi- 
trary index, no finite number of prophecy variables can capture all the relevant 
indices. It will successively rule out longer finite spurious counterexamples, but 
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will never be refined enough to prove the property unboundedly. We believe that 
this limitation can be addressed in future work, perhaps by adapting techniques 
from [52]. However, it is not yet clear how to automate that process. Note that 
an even simpler system which does not add 1 in the update would already be 
problematic; however, for that case, it is straightforward to extend our algorithm 
to have it learn that the array does not change. 

A related, but less fundamental issue is that the index set might not contain 
the best choice of targets for prophecy. While the index set is sufficient for ruling 
out bounded counterexamples, it is possible there is a better target for universal 
prophecy that does not appear in the index set. However, based on the evaluation 
in Sec. 6, it appears that the index set does work well in practice. 


6 Experiments 


Implementation. In this section, we evaluate a prototype implementation 
of counterexample-guided prophecy, which instantiates Prove with ic3ia [34] 
(downloaded Apr 27, 2020), an open-source C++ implementation of IC3 via 
Implicit Predicate Abstraction (IC3IA) [20], which is itself a CEGAR loop that 
uses implicit predicate abstraction to perform IC3 [12] on infinite-state systems 
and uses interpolants to find new predicates. ic3ia uses MathSAT [21] (version 
5.6.3) as the backend SMT solver and interpolant producer. We call our proto- 
type prophic3 [48]. In our implementation, we also include a simple abstraction- 
refinement wrapper which abstracts large constant integers and refines them with 
the actual values if that fails. This is especially useful for dealing with software 
benchmarks with large constant loop bounds. Otherwise, the system might need 
to be unrolled to a very large bound to reach an abstract counterexample. 


Setup. We evaluate our tool against three state-of-the-art tools for inferring uni- 
versally quantified invariants over linear arithmetic and arrays: freqhorn, quic3, 
and gspacer. All these tools are Constrained Horn Clause (CHC) solvers built 
on Z3 [54]. The algorithm implemented in freqhorn [28] is a syntax-guided syn- 
thesis [4] approach for inferring universally quantified invariants over arrays [29]. 
quic3 is built on Spacer [40], the default CHC engine in Z3, and extends IC3 
over linear arithmetic and arrays to allow universally quantified frames (frames 
are candidates for inductive invariants maintained by the IC3 algorithm). It also 
maintains a set of quantifier instantiations which are provided to the underly- 
ing SMT solver. quic3 was recently incorporated into Z3. We used Z3 version 
4.8.9 with parameters suggested by the quic3 authors.’ Finally, gspacer is an 
extension of Spacer which adds three new inference rules for improving local 
generalizations with global guidance. While this last technique does not specifi- 
cally target universally quantified invariants, it can be used along with the quic3 
options in Spacer and potentially executes a much different search. The gspacer 


a fp.spacer.q3.use_qgen=true fp.spacer.ground_pobs=false 
fp.spacer.mbqi=false fp.spacer.use_euf_gen=true 
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group freqhorn (81)|quic3 (42)|vizel (32) |chc-comp (501) tool total 

prophic3 67/4 42/0 | 20/3 1) 43/159 59) 172/166 60 
prophic3-SA| 62/4 37/0 19/3 1| 36/160 67| 154/167 68 
freqhorn 65/4 0/0 0/1 0 5/46 1| 70/51 1 
quic3 55/4 34/0 15/4 1- 74/137 75 178/145 76 
gspacer 35/5 27/0 18/4 1| 66/138 94) 146/147 95 
ic3ia 0/4 0/0 0/3 1| 0/158 59| 0/165 60 
spacer 0/5 0/0 0/4 1 0/134 77| 0/143 78 


Fig. 3: Experimental results. The safe results are reported as # Q / # QF. The second 
column per group shows unsafe results, the first two groups had only safe benchmarks. 


submission [43] won the arrays category in CHC-COMP 2020 [58]. We also in- 
clude ic3ia and the default configuration of Spacer in our results, neither of 
which can produce universally quantified invariants. Our default configuration 
of prophic3 uses weak abstraction, but we also include a version running strong 
abstraction (prophic3-SA) in our experiments. We chose to build our prototype 
on ic3ia instead of Spacer, in part because we needed uninterpreted functions 
for our array abstraction, and Spacer does not handle them in a straightforward 
way, due to the semantics of CHC [11]. 

We compare these solvers on four benchmark sets: i) freghorn - benchmarks 
from the freqhorn paper [29]; ii) guic3 - benchmarks from the quic3 paper [37] 
(these were C programs from SV-COMP [8] that were modified to require uni- 
versally quantified invariants); iii) vizel - additional benchmarks provided to us 
by the authors of [37]; and iv) chc-comp-2020 - the array category benchmarks 
of CHC-COMP 2020 [57]. Additionally, we sort the benchmarks into three cate- 
gories: 1) Q - safe benchmarks solved by some tool supporting quantified invari- 
ants but none of the solvers that do not; 2) QF - those solved by at least one of 
the tools that do not support quantified invariants, plus any unsafe benchmarks; 
and 3) U - unsolved benchmarks. Because not all of the benchmark sets were 
guaranteed to require quantifiers, this is an approximation of which benchmarks 
required quantified reasoning to prove safe. 

Both prophic3 and ic3ia take a transition system and property specified 
in the Verification Modulo Theories (VMT) format [23], which is a transition 
system format built on SMT-LIB [6]. All other solvers read the CHC format. 
We translated benchmark sets i and iv from CHC to VMT using the horn2umt 
program which is distributed with ic3ia. For benchmark sets ii and iii, we 
started with the C programs and generated both VMT and CHC using Kratos2 
(an updated version of Kratos [19]). We ran all experiments on a 3.5GHz Intel 
Xeon E5-2637 v4 CPU with a timeout of 2 hours and a memory limit of 32Gb. 
An artifact for reproducing these results is publicly available [49,38]. 


Results. The results are shown in Fig. 3. We first observe that prophic3 solves 
the most benchmarks in each of the first three sets, both overall and in category 
Q. The quic3 (and most of the freqhorn) benchmarks require quantified invari- 
ants; thus, ic3ia and Spacer cannot solve any of them. On solved instances in 
the Q category, prophic3 introduced an average of 1.2 prophecy variables and a 
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median of 1. This makes sense because, upon inspection, most benchmarks only 
require one quantifier and we are careful to only introduce prophecy variables 
when needed. On benchmarks it cannot solve, ic3ia either times out or fails 
to compute an interpolant. This is expected because quantifier-free interpolants 
are not guaranteed over the standard theory of arrays. Even without arrays, it is 
also possible for prophic3 to fail to compute an interpolant, because MathSAT’s 
interpolation procedure is incomplete for combinations with non-convex theories 
such as integers. However, this was rarely observed in practice. 


We also observe that prophic3-SA solves fewer benchmarks in the first three 
sets. However, it is faster on commonly solved instances. This makes sense be- 
cause it needs to check fewer axioms (it uses built-in equality and thus does not 
check equality axioms). We suspect that it solves fewer benchmarks in the first 
three sets because it was unable to find the right prophecy variable. For exam- 
ple, for the standard_find_true-unreach-call_ground benchmark in the quic3 
set, a prophecy variable is needed to find a quantifier-free invariant. However, 
because of the stronger reasoning power of SA, the system can be sufficiently re- 
fined without introducing auxiliary variables. ic3ia is then unable to prove the 
property on the resulting system without the prophecy variable, instead timing 
out. Interestingly, notice that prophic3-SA solves the most benchmarks in the 
QF category overall, suggesting that there are practical performance benefits of 
the CEGAR approach even when quantified reasoning is not needed. 


There was one discrepancy on the CHC-COMP 2020 benchmarks: gspacer 
disagrees with quic3, Spacer, and prophic3 on chc-LIA-lin-arrays_881. This is 
the same discrepancy mentioned in the CHC-COMP 2020 report [58]. prophic3 
proved this benchmark safe without introducing any auxiliary variables and we 
used both CVC4 [5] and MathSAT to verify that the solution was indeed an in- 
ductive invariant for the concrete system. We are confident that this benchmark 
is safe and thus do not count it as a solved instance for gspacer. 


Some of the tools are sensitive to the encoding. Since it is syntax-guided, 
freghorn is sensitive to the encoding syntax. The freqhorn benchmarks were 
hand-written to be syntactically simple, an encoding which is also good for 
prophic3. However, prophic3 can be sensitive to other encodings. For example, 
the quic3 benchmarks are also included in the chc-comp-2020 set, but trans- 
lated by SeaHorn [35] instead of Kratos2. prophic3 does much worse on the 
SeaHorn encoding (6 vs 42). We stress that the CHC solvers performed similarly 
on both encodings, so we did not compare against disadvantaged solvers. In fact, 
quic3 and freghorn solved exactly the same number in both translations. How- 
ever, gspacer solved fewer using the Kratos2 encoding (27 vs 34). Importantly, 
prophic3 on the Kratos2 encoding solved more benchmarks than any other tool 
and encoding pair. 


There are two main reasons why prophic3 fails on the SeaHorn encodings. 
First, due to the LLVM-based encoding, some of the SeaHorn translations have 
index sets which are insufficient for finding the right prophecy variable. This has 
to do with the memory encoding and the way that fresh variables and guards 
are used. SeaHorn also splits memories into ranges which is problematic for our 
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technique. Second, the SeaHorn translation is optimized for CHC, not for transi- 
tion systems. For example, it introduces many new variables, and the argument 
order between different predicates may not match. In the transition system, this 
essentially has the effect of interchanging the values of variables between each 
loop. SeaHorn has options that address some of these issues, and these helped 
prophic3 solve more benchmarks, but none of these options produce encod- 
ings that work as well as the Kratos2 encodings. The difference between good 
CHC and transition system encodings could also explain the overall difference 
in performance on chc-comp-2020 benchmarks, most of which were translated 
by SeaHorn. Both of these issues are practical, not fundamental, and we believe 
they can be resolved with additional engineering effort. 


7 Related Work 


There are two important related approaches for abstracting arrays in horn clauses 
[53] and memories in hardware [10]. Both make a similar observation that ar- 
rays can be abstracted by modifying the property to maintain values at only a 
finite set of symbolic indices. We differ from the former by using a refinement 
loop that automatically adjusts the precision and targets relevant indices. The 
latter is also a refinement loop that adjusts precision, but differs in the domain 
and the refinement approach, which uses a multiplexer tree. We differ from both 
approaches in our use of array axioms to find and add auxiliary variables. 

A similar lazy array axiom instantiation technique is proposed in [15]. How- 
ever, their technique utilizes interpolants for finding violated axioms and cannot 
infer universally quantified invariants. The work of [18] also uses lazy axiom- 
based refinement, abstracting non-linear arithmetic with uninterpreted func- 
tions. We differ in the domain and the use of auxiliary variables. In [55], prophecy 
variables defined by temporal logic formulas are used for liveness and temporal 
proofs, with the primary goal of increasing the power of a temporal proof sys- 
tem. In contrast, we use prophecy variables here for a different purpose, and we 
also find them automatically. The work of [24] includes an approach for synthe- 
sizing auxiliary variables for modular verification of concurrent programs. Our 
approach differs significantly in the domain and details. 

There is a substantial body of work on automated quantified invariant gen- 
eration for arrays using first-order theorem provers [42,16,41,51]. These include 
extensions to saturation-based theorem proving to analyze specific kinds of pred- 
icates, and an extension to paramodulation-based theorem proving to produce 
universally quantified interpolants. In [46], the authors propose an abstract in- 
terpretation approach to synthesize universally quantified array invariants. Our 
method also uses abstraction, but in a CEGAR framework. 

Two other notable approaches capable of proving properties over arrays that 
require invariants with alternating quantifiers are [30,56]. The former proposes 
trace logic for extending first-order theorem provers to software verification, and 
the latter takes a countererample-guided inductive synthesis approach. Our ap- 
proach takes a model checking perspective and differs significantly in the details. 
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While these approaches are more general, we compared against state-of-the-art 
tools that focus specifically on universally quantified invariants. 

MCMT [31,33,25] and its derivatives [2,3] are backward-reachability algo- 
rithms for proving properties over “array-based systems,” which are typically 
used to model parameterized protocols. These approaches target syntactically 
restricted functional transition systems with universally quantified properties, 
whereas our approach targets general transition systems. Two other approaches 
for solving parameterized systems modeled with arrays are [36] and [47]. The 
former iteratively fixes the number of expected universal quantifiers, then ea- 
gerly instantiates them and encodes the invariant search to nonlinear CHC. The 
latter first uses a finite-state model checker to discover an inductive invariant for 
a specific parameterization and then applies a heuristic generalization process. 
We differ from all these techniques in domain and the use of auxiliary variables. 
Due to the limitations explained in Sec. 5, we do not expect our approach to 
work well for parameterized protocol verification without improvements. 

In [45], heuristics are proposed for finding predicates with free indices that 
can be universally quantified in a predicate abstraction-based inductive invariant 
search. Our approach is counterexample-guided and does not utilize predicate 
abstraction directly (although IC3IA does). The authors of [39] propose a tech- 
nique for Java programs that associates heap memory with the program location 
where it was allocated and generates CHC verification conditions. This enables 
the discovery of invariants over all heap memory allocated at that location, which 
implicitly provides quantified invariants. This is similar to our approach in that 
it gives quantification power without explicitly using quantifiers and in that 
their encoding removes arrays. However, we differ in that we focus on transition 
systems and utilize a different paradigm to obtain this implicit quantification. 


8 Conclusion 


We presented a novel approach for model checking transition systems containing 
arrays. We observed that history and prophecy variables can be extremely useful 
for reducing quantified invariants to quantifier-free invariants. We demonstrated 
that an initially weak abstraction in our CEGAR loop can help us to automati- 
cally introduce relevant auxiliary variables. Finally, we evaluated our approach 
on four sets of interesting array-manipulating benchmarks. In future work, we 
hope to improve performance, explore a tighter integration with the underly- 
ing model checker, address the limitations described in Sec. 5, and investigate 
applications of counterexample-guided prophecy to other theories. 
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Abstract. Since 2013, the leading SAT solvers in the SAT competition all use in- 
processing, which unlike preprocessing, interleaves search with simplifications. 
However, applying inprocessing frequently can still be a bottle neck, i.e., for hard 
or large formulas. In this work, we introduce the first attempt to parallelize in- 
processing on GPU architectures. As memory is a scarce resource in GPUs, we 
present new space-efficient data structures and devise a data-parallel garbage col- 
lector. It runs in parallel on the GPU to reduce memory consumption and im- 
proves memory access locality. Our new parallel variable elimination algorithm 
is twice as fast as previous work. In experiments our new solver PARAFROST 
solves many benchmarks faster on the GPU than its sequential counterparts. 


Keywords: Satisfiability - Variable Elimination - Eager Redundancy Elimination 
- Parallel SAT Inprocessing - Parallel Garbage Collection - GPU. 


1 Introduction 


During the past decade, SAT solving has been used extensively in many applications, 
such as combinational equivalence checking [27], automatic test pattern generation [33, 
40], automatic theorem proving [14], and symbolic model checking [7, 13]. Simplifying 
SAT problems prior to solving them has proven its effectiveness in modern conflict- 
driven clause learning (CDCL) SAT solvers [5, 6, 17], particularly when applied on 
real-world applications relevant to software and hardware verification [16, 20, 22, 24]. 
Since 2013, simplification techniques [8, 16, 19, 21,41] are also used periodically 
during SAT solving, which is known as inprocessing [3—-6, 23]. Applying inprocessing 
iteratively to large problems can be a performance bottleneck in SAT solving procedure, 
or even increase the size of the formula, negatively impacting the solving time. 
Graphics processors (GPUs) have become attractive for general-purpose computing 
with the availability of the Compute Unified Device Architecture (CUDA) program- 
ming model. CUDA is widely used to accelerate applications that are computation- 
ally intensive w.r.t. data processing. For instance, we have applied GPUs to accelerate 
explicit-state model checking [11,43], bisimilarity checking [42], the reconstruction of 
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genetic networks [12], wind turbine emulation [30], metaheuristic SAT solving [44], 
and SAT-based test generation [33]. Recently, we introduced SIGmA [34, 35] as the 
first SAT simplification preprocessor to exploit GPUs. 


Contributions. Embedding GPU inprocessing in a SAT solver is highly non-trivial and 
has never been attempted before, according to the best of our knowledge. Efficient data 
structures are needed that allow parallel processing, and that support efficient adding 
and removing of clauses. For this purpose, we contribute the following: 
1. We propose a new dynamically expanded data structure for clauses supporting both 
32-bit [17] and 64-bit references with a minimum of 20 bytes per clause. 
2. A new parallel garbage collector is presented, tailored for GPU inprocessing. 
3. Our new parallel variable elimination algorithm is twice as fast as [34] and together 
with other improvements yields much higher performance and robustness. 
4. Our parallel inprocessing is deterministic (i.e., results are reproducible). 
In addition, we propose a new preprocessing technique targeted towards data-parallel 
execution, called Eager Redundancy Elimination (ERE), which is applicable on both 
original and learnt clauses. All contributions have been implemented in our solver 
PARAFROST and benchmarked on a larger set than considered previously in [34], 
using 493 application problems. We discuss the potential performance gain of the GPU 
inprocessing and its impact on SAT solving, compared to a sequential version of our 
solver as well as CADICAL [6], a state-of-the-art solver developed by the last author. 


2 Preliminaries 


All SAT formulas in this paper are in conjunctive normal form (CNF). A CNF formula 
is a conjunction of m clauses Nai Ci, where each clause C; is a disjunction of k literals 
Via Lj, and a literal is a Boolean variable x or its complement =x, which we refer to 
as z. We represent clauses by sets of literals, i.e., {/1,... , x} represents the formula 
4L V... V lk, and a SAT formula by a set of clauses, i.e., {C1, . . . , Cm} represents the 
formula Cy A... ^A Cm. With Se, we refer to the set of clauses containing literal £, i.e., 
Sı = {C € S | £ € C}. If for a variable x, we have either Ss = Ú or Sz = Ú (but 
not both), then the literal z or x, respectively, is called a pure literal. A clause C' is a 
tautology iff there exists a variable x with {x, z} C C, and C is unit iff |C| = 1. 

In this paper we integrate GPU-accelerated inprocessing and CDCL [28, 32, 36]. 
One important aspect of CDCL is to learn from previous assignments to prune the 
search space and make better decisions in the future. This learning process involves the 
periodic adding of new learnt clauses to the input formula while CDCL is running. 

In this paper, clauses are either considered to be LEARNT or ORIGINAL (redundant 
and irredundant in [23] and in the SAT solver CADICAL [6]). A LEARNT clause is 
added to the formula by the CDCL clause learning process, and an ORIGINAL clause is 
part of the formula from the very start. Furthermore, each assignment is associated with 
a decision level that acts as a time stamp, to monitor the order in which assignments are 
performed. The first assignment is made at decision level one. 


Variable Elimination (VE). Variables can be removed from clauses by either applying 
the resolution rule or substitution (also known as gate equivalence reasoning) [16, 23]. 
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Concerning the former, we represent application of the resolution rule w.r.t. some vari- 
able x using a resolving operator x on clauses C4 and C2. The result of applying 
the rule is called the resolvent [41]. It is defined as C1 @z C2 = C1 U Ca \ {2,z}, 
and can be applied iff xv € Ci, © € C2. The y operator can be extended to re- 
solve sets of clauses w.r.t. variable x. For a formula S, let £ C S be the set of learnt 
clauses when we apply the resolution rule. The set of new resolvents is then defined as 
Rz(S) = {C1 er Co | Cı € Sz \ LA Co = Sz \ LA aAy.{y, 7} C Ci Qr Co}. 
Notice that the learnt clauses can be ignored [23] (i.e., in practice, it is not effective to 
apply resolution on learnt clauses). The last condition avoids that a resolvent should not 
be a tautology. After eliminating variable x in S, the resulting formula S’ is defined as 
S’ = R,(S)U(S \ (Sz U Sz)), i.e., the new resolvents are combined with the original 
and learnt clauses that do not reference x. 


Substitution detects patterns encoding logical gates, and substitutes the involved 
variables with their gate-equivalent counterparts. Previously [34], we only considered 
AND gates. In the current work, we add support for Inverter, If-Then-Else and XOR gate 
extractions. For all logical gates, substitution can be performed by resolving non-gate 
clauses (i.e., clauses not contributing to the gate itself) with gate clauses [23]. 


For instance, the first three clauses in the formula {{x, a, b}, {z, a}, {z, b}, {x, c}} 
together encode a logical AND-gate, hence the final clause can be resolved with the sec- 
ond and the third clauses, producing the simplified formula {{a, c}, {b, c}}. Combining 
gate equivalence reasoning with the resolution rule tends to result in smaller formulas 
compared to only applying the resolution rule [16, 23, 37]. 


Subsumption elimination. SUB performs se/f-subsuming resolution followed by sub- 
sumption elimination [16]. The former can be applied on clauses C1, C2 iff for some 
variable x, we have Cy = C{ U {x}, C2 = Ch U {Z}, and C4 C Ci. In that case, x 
can be removed from C4. The latter is applied on clauses C1, C2 with C2 C C4. In that 
case, C4 is redundant and can be removed. If Cù is a LEARNT clause, it must be consid- 
ered aS ORIGINAL in the future, to prevent deleting it during learnt clause reduction, a 
procedure which attempts to reduce the number of learnt clauses [6,23]. For instance, 
consider the formula S = {{a, b, c}, {@, b}, {b, c, d} }. The first clause is self-subsumed 
by the second clause w.r.t. variable a and can be strengthened to {b,c} which in turn 
subsumes the last clause {b, c,d}. The latter clause is then removed from S and the 
simplified formula becomes { {b, c}, {@, b} }. 


Blocked clause elimination. BCE [25] can remove clauses for which variable elimi- 
nation always results in tautologies. Consider the formula {{a, b, c}, {a, b}, {a, c}}. All 
three literals a, b and c are blocking the first clause, since resolving a produces the tau- 
tologies {{b, c, b}, {b, c, c}}, resolving b produces {ā, a,c}, and resolving c produces 
{a, a,b}. Hence the blocked clause {a, b, c} can be removed from S. Again, as for VE, 
only original clauses are considered. 


Eager Redundancy Elimination. ERE is a new elimination technique that we propose, 
which repeats the following until a fixpoint has been reached: for a given formula S and 
clauses Cy € S, C2 € S with x € Cı and Z € Ch for some variable x, if there exists a 
clause C € S for which C = C1 Qx C2, then let S := S \ {C}. In this work, we restrict 
removing C to the condition (C4 is LEARNT V C2 is LEARNT) = C is LEARNT. 
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If the condition holds, C is called a redundancy and can be removed without alter- 
ing the original satisfiability. For example, consider S = {{a, c}, {c, b}, {d, c}, {b,a}, 
{a, d}}. Resolving the first two clauses gives the resolvent {a, b} which is equivalent to 
the fourth clause in S. Also, resolving the third clause with the last clause yields {a, č} 
which is equivalent to the first clause in S. ERE can remove either {a,€} or {a,b} but 
not both. Note that this method is entirely different from Asymmetric Tautology Elimi- 
nation in [21]. The latter requires adding so-called hidden literals to all clauses to check 
which is a hidden tautology. ERE can operate on learnt clauses and does not require 
literals addition, making it more effective and adequate to data parallelism. 


3 GPU Memory and Data Structures 


GPU Architecture. Since 2007, NVIDIA has been developing a parallel computing 
platform called CUDA [31] that allows developers to use GPU resources for general 
purpose processing. A GPU contains multiple streaming multiprocessors (SMs), each 
SM consisting of an array of streaming processors (SPs). Every SM can execute multi- 
ple threads grouped together in 32-thread scheduling units called warps. 

A GPU computation can be launched in a program by the host (CPU side of a 
program) by calling a GPU function called a kernel, which is executed by the device 
(GPU side of a program). When a kernel is called, it is specified how many threads need 
to execute it. These threads are partitioned into thread blocks of up to 1,024 threads 
(or 32 warps). Each block is assigned to an SM. All threads together form a grid. A 
hardware warp scheduler evenly distributes the launched blocks to the available SMs. 
Concerning the memory hierarchy, a GPU has multiple types of memory: 

— Global memory with high bandwidth but also high latency is accessible by both 

GPU threads and CPU threads and thus acts as interface between CPU and GPU. 

— Constant memory is read-only for all GPU threads. It has a lower latency than 
global memory, and can be used to store any pre-defined constants. 

— Shared memory is on-chip memory shared by the threads in a block. Each SM has 
its own shared memory. It is much smaller in size than global and constant memory 

(in the order of tens of kilobytes), but has a much lower latency. It can be used to 

efficiently communicate data between threads in a block. 

— Registers are used for on-chip storage of thread-local data. It is very small, but 

provides the fastest memory. 

To hide the latency of global memory, ensuring that the threads perform coalesced 
accesses is one of the best practices. When the threads in a warp try to access a con- 
secutive block of 32-bit words, their accesses are combined into a single (coalesced) 
memory access. Uncoalesced memory accesses can, for instance, be caused by data 
sparsity or misalignment. Furthermore, we use unified memory [31] to store the main 
data structures that need to be regularly accessed by both the CPU and the GPU. Unified 
memory creates a pool of managed memory that is shared between the CPU and GPU. 
This pool is accessible to both sides using the same addresses. Regarding atomicity, a 
GPU can run atomic instructions on both global and shared memory. Such an instruc- 
tion performs a read-modify-write memory operation on one 32-bit or 64-bit word. 
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class CNF { 
class SCLAUSE { struct { 
char state, flag; uint32* memory; 
char added, used; uint64 size, cap; 
int size, lbd; } clauses; 
uint32 sig; struct { 
union { uint64* memory; 
uint32 literals[1]; uint32 size, cap; 
}3 } references; 
} } 


(a) container for a clause (b) container for a formula 


Fig. 1: Data structures to store a SAT formula on a GPU 


Data Structures. To efficiently implement inprocessing techniques for GPU archi- 
tectures, we designed a new data structure from scratch to count the number of learnt 
clauses, and store other relevant clause information, while keeping the memory con- 
sumption as low as possible. Fig. 1 shows the proposed structures to store a clause 
(denoted by SCLAUSE) and the SAT formula represented in CNF form (denoted by 
CNF). The state member in Fig. la stores the current clause state. A clause is either 
ORIGINAL, LEARNT (see Section 2) or DELETED. A GPU thread is not allowed to deal- 
locate memory, however, a clause can be set to DELETED and freed later during garbage 
collection. The members added and flag mark the clause for being resolvent (when 
applying the resolution rule) and contributing to a gate (for substitution), respectively. 
The 1bd entry denotes the literal block distance (LBD), i.e., the number of decision 
levels contributing to a conflict [2]. The used counter is used to keep track of how long 
a LEARNT clause should be used before it gets deleted during database reduction [6, 38]. 
Both used and 1bd can be altered via clause strengthening [6] in SUB. 


The signature (sig) of a clause is computed by hashing its literals to a 32-bit 
value [16]. It is used to quickly compare clauses. The first literal in a clause is preallo- 
cated and stored in the fixed array 1iterals[1]. As has been done for the MINISAT 
solver, we adapted the union structure to allow dynamically expanding the literals 
array. This is accepted by NVIDIA’s compiler (NVCC). In our previous work [34], we 
stored a pointer in each clause referencing the first literal, with the literals being in a 
separate array. This consumes 8 bytes of the clause space. However, SCLAUSE only 
needs 4 bytes for the literals array, resulting in the clause occupying 20 bytes in 
total, including the extra information of the learnt clause, compared to 24 bytes in our 
previous work. 


As implemented in MINISAT, we use the clauses field in CNF (Fig. 1b) to store 
the raw bytes of SCLAUSE instances with any extra literals in 4-byte buckets with 64- 
bit reference support. The cap variable indicates the total memory capacity available 
for the storage of clauses, and size reflects the current size of the list of clauses. We 
always have size < cap. The references field is used to directly access the clauses 
by saving for each clause a reference to their first bucket. The mechanism for storing 
references works in the same way as for clauses. 


In addition, in a similar way, an occurrence table structure, denoted by OT, is created 
which has a raw pointer to store the 64-bit clause references for each literal in the 
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formula and a member structure OL. The creation of an OL instance is done in parallel 
on the GPU for each literal using atomic instructions. For each clause C, a thread is 
launched to insert the occurrences of C’s literals in the associated lists. 

Initially, we pre-allocate unified memory for clauses and references whichis in 
size twice as large as the input formula, to guarantee enough space for the original and 
learnt clauses. This amount is guaranteed to be enough as we enforce that the number 
of resolvents never exceeds the number of ORIGINAL clauses. The OT memory is real- 
located dynamically if needed after each variable elimination. Furthermore, we check 
the amount of free available GPU memory before allocation is done. If no memory is 
available, the inprocessing step is skipped and the solving continues on the CPU. 


4 Parallel Garbage Collection 


Modern sequential SAT solvers implement a garbage collection (GC) algorithm to re- 
duce memory consumption and maintain data locality [2,6, 17]. 

Since GPU global memory is a scarce resource and coalesced accesses are essential 
to hide the latency of global memory (see Section 2), we decided to develop an efficient 
and parallel GC algorithm for the GPU without adding overhead to the GPU computa- 
tions. 


Fig. 2 demonstrates the proposed 
approach for a simple SAT for- Threads ~a 
ules = 408.6 oh a Bel a i 
{d,b}}, in which {a,b,é2} is to be 
deleted. The figure shows, in addition, 
how the references and clauses 
lists in Fig. 1b are updated for the given er ; j 
formula. The reference for each clause — 

C is calculated based on the sum of »uckets [7] [o][6][6] 

the sizes (in buckets) of all clauses pre- Doo ae 00°00 Step 2 
ceding C in the list of clauses. For 

example, the first clause (C1) requires EE T iced E a 
a+ (k — 1) = 5 +2 = 7 buckets, Stepa 
where the constant a is the number of 
buckets needed to store SCLAUSE, in 
our case 20 bytes / 4 bytes, and k is 
the clause size in terms of the number 
of literals. Given the number of buck- 
ets needed for C4, the next clause (C2) Fig. 2: An example of parallel GC on a GPU 
must be stored starting from position 7 

in the list of clauses. This position plus the size of C2 determines in a similar way the 
starting position for C3, and so on. 

The first step towards compacting the CNF instance when C% is to be deleted is 
to compute a stencil and a list of corresponding clause sizes in terms of numbers of 
buckets. In this step, each clause C; is inspected by a different thread that writes a ‘0’ 


j ; } SAT formula 
7 Ci{a ab c} 
c2{a b 7ac}x 
C3{d ab} 

C4 {ad b} 


Step 1 
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Algorithm 1: Parallel Garbage Collection 


Input :—global_ Sin, stencil, buckets, —constant— a@ , __shared__ shCls, shLits 
Output: numCls, numLits 


numCls, numLits <- COUNTSURVIVED(S;n); 

Sout <— ALLOCATE(numCls, numLits); 

stencil, buckets <- COMPUTESTENCIL(S;n); 

buckets 4+ EXCLUSIVESCAN(buckets); 
references(Sout) <_ COMPACTREFS(buckets, stencil); 
COPYCLAUSES(Sout, Sin, buckets, stencil); 


Aw bk wne 


7 kernel COUNTSURVIVED (Sin): 


8 register rCls <— 0, rLits <— 0; 

9 for alli € [ 0, |Sin| ] in parallel 

10 register C < Sin [i]; 

1 if state(C) # DELETED then 

12 | rCls < rCls + 1, rLits < rLits + |C|; 

13 if tid < |Sin | then 

14 shCls{tid] = rCls, shLits{tid] = rLits; 

15 else 

16 shCls[tid] = 0, shLits[tid] = 0; 

17 SYNCTHREADS( ); 

18 for b : blockDim/2,b/2 + 1do // b will be blockDim/2, (blockDim/2)/2, ..., 1 
19 if tid < b then 

20 | shCls[tid] <— shCls{tid] + shCls|tid + b], shLits[tid] — shLits{tid] + shLits{tid + b]; 
21 SYNCTHREADS( ); 

22 if tid = 0 then 

23 ATOMICADD(numCls, shCls|tid]), ATOMICADD(numLits, shLits[|tid]); 

24 kernel COMPUTESTENCIL (Sin): 

25 for alli € |O, |S;n] ] in parallel 

26 register C 4+ Sin [i]; 

27 if state(C) = DELETED then 

28 | stencilļi] + 0, buckets[i] + 0; 

29 else 

30 | stencilfi] = i, buckets|i] < a + UC= 1); 

31 kernel COPYCLAUSES (Sout, Sin, buckets, stencil): 

32 for alli € [ 0,|S;n| ] in parallel 

33 if stencil|[i] then 

34 register & Cgest + (SCLAUSE &)(clauses(Sou+) + buckets|i]); 
35 Caest <— Sin [i]; 


at position 7 of a list named stencil if the clause must be deleted, and a ‘1’ otherwise. 
The size of stencil is equal to the number of clauses. In a list of the same size called 
buckets, the thread writes at position 2 ‘O’ if the clause will be deleted, and otherwise 
the size of the clause in terms of the number of buckets. 

At step 2, a parallel exclusive-segmented scan operation is applied on the buckets 
array to compute the new references. In this scan, the value stored at position 7, masked 
by the corresponding stencil, is the sum of the values stored at positions 0 up to, but 
not including, 7. An optimised GPU implementation of this operation is available via 
the CUDA CUB library [29], which transforms a list of size n in log(n) iterations. In 
the example, this results in C’3 being assigned reference 7, thereby replacing C2. 

At step 3, the stencil list is used to update references in parallel, which are 
be kept together in consecutive positions. The standard DeviceSelect::Flagged 
function of the CUB library can be used for this, which uses stream compaction [10]. 
Finally, the actual clauses are copied to their new locations in clauses. 

Alg. 1 describes in detail the GPU implementation of the parallel GC. As input, 
Alg. 1 requires a SAT formula S;,, as an instance of CNF. The constant œ is kept in 
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GPU constant memory for fast access. The highlighted lines in grey are executed on 
GPU. To begin GC, we count the number of clauses and literals in the S;,, formula after 
simplification has been applied (line 1). The counting is done via the parallel reduction 
kernel COUNTSURVIVED, listed at lines 7-23. In kernels, we use two conventions. First 
of all, with tid, we refer to the block-local ID of the executing thread. By using this ID, 
we can achieve that different threads in the same block work on different data, as for 
instance at lines 13-16. Second of all, we use so-called grid-stride loops to process data 
elements in parallel. An example of this starts at line 9. The statement for all i € [0, N] 
in parallel expresses that all natural numbers in the range [0, N) must be considered 
in the loop, and that this is done in parallel by having each executing thread start with 
element tid, i.e., i = tid, and before starting each additional iteration through the loop, 
the thread adds to 7 the total number of threads on the GPU. If the updated 2 is smaller 
than N, the next iteration is performed with this updated 7. Otherwise, the thread exits 
the loop. A grid-stride loop ensures that when the range of numbers to consider is larger 
than the number of threads, all numbers are still processed. 


The values rCls and rLits at line 8 will hold the current number of clauses and 
literals, respectively, counted by the executing thread. The register keyword indicates 
that the variables are stored in the thread-local register memory. Within the loop at lines 
9-12, the counters rCls, rLits are updated incrementally if the clause at position 7 in 
clauses is not deleted. Once a thread has checked all its assigned clauses, it stores the 
counter values in the (block-local) shared memory arrays (shCls, shLits) at lines 13-14. 


A non-participating thread simply writes zeros (line 16). Next, all threads in the 
block are synchronised by the SYNCTHREADS call. The loop at lines 18-21 performs the 
actual parallel reduction to accumulate the number of non-deleted clauses and literals 
in shared memory within thread blocks. In the for loop, b is initially set to the number 
of threads in the block (blockDim), and in each iteration, this value is divided by 2 until 
it is equal to 1 (note that blocks always consist of a power of two number of threads). 


The total number of clauses and threads is in the end stored by thread 0, and this 
thread adds those numbers using atomic instructions to the globally stored counters 
numCls and numLits at line 23, resulting in the final output. In the procedure described 
here, we prevent having each thread perform atomic instructions on the global memory, 
by which we avoid a potential performance bottleneck. The computed numbers are used 
to allocate enough memory for the output formula at line 2 on the CPU side. 


The kernel COMPUTESTENCIL, called at line 3, is responsible for checking clause 
states and computing the number of buckets for each clause. The COMPUTESTENCIL 
kernel is given at lines 24-30. If a clause C is set to DELETED (line 27), the correspond- 
ing entries in stencil and buckets are cleared at line 28, otherwise the stencil 
entry is set to 1 and the buckets entry is updated with the number of clause buckets. 


The EXCLUSIVESCAN routine at line 4 calculates the new references to store the 
remaining clauses based on the collected buckets. For that, we use the exclusive scan 
method offered by the CUB library. The COMPACTREFS routine called at line 5 groups 
the valid references, i.e., those flagged by stencil, into consecutive values and stores 
them in references(Sout), which refers to the references field of the output for- 
mula Sout. Finally, copying clause contents (literals, state, etc.) is done in the COPY- 
CLAUSES kernel, called at line 6. This kernel is described at lines 31-35. If a clause in 
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Sin is flagged by stencil via thread i, then a new SCLAUSE reference is created in 
clauses(Soyz), which refers to the clauses field in Sout, offset by bucket s[i]. 

The GC mechanism described above resulted from experimenting with several less 
efficient mechanisms first. In the first attempt, two atomic additions per thread were 
performed for each clause, one to move the non-deleted clause buckets and the other 
for moving the corresponding reference. However, the excessive use of atomics resulted 
in a performance bottleneck and produced a different simplified formula on each run, 
that is, the order in which the new clauses were stored depended on the outcome of 
the atomic instructions. The second attempt was to maintain stability by moving the 
GC to the host side. However, accessing unified memory on the host side results in a 
performance penalty, as it implicitly results in copying data to the host side. 


5 Parallel Inprocessing Procedure 


To exploit parallelism in simplifications, each elimination method is applied on mul- 
tiple variables simultaneously. Doing so is non-trivial, since variables may depend 
on each other; two variables x and y are dependent iff there exists a clause C with 
(xe CVEEC)A(ye CV¥Y€EC). If both x and y were to be processed for sim- 
plification, two threads might manipulate C at the same time. To guarantee soundness 
of the parallel simplifications, we apply our least constrained variable elections algo- 
rithm (LCVE) [34] prior to simplification. It is responsible for electing a set of mutually 
independent variables (candidates) from a set of authorised candidates. The remaining 
variables relying on the elected ones are frozen. These notions are defined by Defs. 1-4. 


Definition 1 (Authorised candidates). Given a CNF formula S, we call A the set of 
authorised candidates: A = {x | 1 < h[z] < uV 1 < hz] < u}, where 
— his a histogram array (h|x] is the number of occurrences of x in S). 
— p denotes a given maximum number of occurrences allowed for both x and its 
negation x, representing the cut-off point for the LCVE algorithm. 


Definition 2 (Candidate Dependency Relation). We call a relation D : Ax Aa 
candidate dependency relation iff Vx, y € A, x D y implies that 3C € S.(x E€ CVE 
C)A(yECVGEC) 


Definition 3 (Elected candidates). Given a set of authorised candidates A, we call a 
set p C Aa set of elected candidates iff Vx,y € y. a(a D y) 


Definition 4 (Frozen candidates). Given the sets A and ọ, the set of frozen candi- 
dates F C A is defined as F = {x |x E AANAye vy. xD y} 


A top-level description of GPU parallel inprocessing is shown in Alg. 2. The blue- 
colored lines highlight new contributions of the current work compared to our prepro- 
cessing algorithm presented in [34]. As input, it takes the current formula Sp from the 
solver (executed on the host) and copies it to the device global memory as Sg (line 1). 

Initially, before simplification, we compute the clause signatures and order variables 
via concurrent streams at lines 2-3. A stream is a sequence of instructions that are exe- 
cuted in issue-order on the GPU [31]. The use of concurrent streams allows the running 
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Algorithm 2: Parallel Inprocessing 
Input : S),, p, phases 


1 Sq +— COPYTODEVICE (Sh); 

2 CALCSIGNATURES (Sq, stream0); 

3 A © ORDERVARIABLES (Sq, stream!); 

4 while p : 0 — phases do 

5 SYNCALL (); // Synchronize all streams 
6 T <CREATEOT (Sq); 

7 PROPAGATE (Up, Sa, T); 

8 p + LCVE (Sa, T, A, u); 

9 if p = phases then 


10 ERE (Sa, T, p); 

1 break; 

2 SORTOT (T, p, LISTKEY); 

13 Ua < ELIMINATE (Sq, T, p); // Applies VE, SUB, and BCE 
14 Un < COPYTOHOSTASYNC (Ug, stream!); 

15 COLLECT (Sq, stream2); 

16 we px 2; 

17 device function LISTKEY (a, b): 

18 Ca + Sala], Ca + Salb]; // Ca = {£1, £2, . .-, £k}, Cb = {y1, y2,---, Ye} 
19 if |C.| # |Cy| then return Ca < Cy; 

20 if xı Æ yı then return zı < yı ; 

21 if x2 Æ yz then return z2 < y2; 

22 if |Ca| > 2A (tk A yx) then return x, < Yk ; 

23 else return sig(Ca) < sig(Cv) ; 


of multiple GPU kernels concurrently, if there are enough resources. The ORDERVARI- 
ABLES routine produces an ordered array of authorised candidates A following Def. 1. 
The while loop at lines 4-16 applies VE, SUB, and BCE, for a configured number 
of iterations (indicated by phases), with increasingly large values of the threshold ju. 
Increasing u exponentially allows LCVE to elect additional variables in the next elim- 
ination phase since after a phase is executed on the GPU, many elected variables are 
eliminated. The ERE method is computationally expensive. Therefore, it is only exe- 
cuted once in the final iteration, at line 10. At line 5, SYNCALL is called to synchronize 
all streams being executed. At line 6, the occurrence table 7 is created. The LCVE 
routine produces on the host side an array of elected mutually independent variables y, 
in line with Def. 3. 


The parallel creation of the occurrence lists in 7 results in the order of these lists be- 
ing chosen non-deterministically. This results in the ELIMINATE procedure called at line 
13, which performs the parallel simplifications, to produce results non-deterministically 
as well. To remedy this effect, the lists in 7 are sorted according to a unique key in as- 
cending order. Besides the benefit of stability, this allows SUB to abort early when 
performing subsumption checks. The sorting key function is given as the device func- 
tion LISTKEY at lines 17-24. It takes two references a, b and fetches the corresponding 
clauses Ca, Cp from Sq (line 18). First, clause sizes are tested at line 19. If they are 
equal, the first, the second, and the last literal in each clause are checked, respectively, 
at lines 20-22. Otherwise, clause signatures are tested at line 23. CADICAL implements 
a similar function, but only considers clause sizes [6]. The SORTOT routine launches a 
kernel to sort the lists pointed to by the variables in ọ in parallel. Each thread runs an 
insertion sort to in-place swap clause references using LISTKEY. 
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The ELIMINATE procedure at line 13 calls SUB to remove any subsumed clauses 
or strengthen clauses if possible, after which VE is applied, followed by BCE. The 
SUB and BCE methods call kernels that scan the occurrence lists of all variables in y 
in parallel. For more information on this, see [34]. The VE method uses a new parallel 
approach, which is explained in Section 6. Both the VE and SUB methods may add new 
unit clauses atomically to a separate array Ua. The propagation of these units cannot be 
done immediately on the GPU due to possible data races, as multiple variables in a 
clause may occur in unit clauses. For instance, if we have unit clauses {a} and {b}, 
and these would be processed by different threads, then a clause {a,b,c} could be 
updated by both threads simultaneously. Thus, this propagation is delayed until the 
next iteration, and performed by the host at line 7. Note that 7 must be recreated first 
to consider all resolvents added by VE during the previous phase. The ERE method at 
line 10 is executed only once at the last phase (phases) before the loop is terminated. 
Section 7 explains in detail how ERE can be effective in simplifying both ORIGINAL 
and LEARNT clauses in parallel. At line 14, new units are copied from the device to the 
host array Up asynchronously via stream]. The COLLECT procedure does the GC as 
described by Alg. 4 via stream2. Both streams are synchronized at line 5. 


6 Three-Phase Parallel Variable Elimination 


The BVIPE algorithm in our previous work [34] had a main shortcoming due to the 
heavy use of atomic operations to add new resolvents. Per eliminated variable, two 
atomic instructions were performed, one for adding new clauses and the other for 
adding new literals. Besides performance degradation, this also resulted in the order 
of added clauses being chosen non-deterministically, which impacted reproducibility 
(even though the produced formula would always at least be logically the same). 

The approach to avoiding the excessive use of atomic instructions when adding 
new resolvents is to perform parallel VE in three phases. The first phase scans the 
constructed list y to identify the elimination type (e.g., resolution or gate substitution) of 
each variable and to calculate the number of resolvents and their corresponding buckets. 

The second phase computes an exclusive scan to determine the new references for 
adding resolvents, as is done in our GC mechanism (Section 4). At the last phase, we 
store the actual resolvents in their new locations in the simplified formula. For solution 
reconstruction, we use an atomic addition to count the resolved literals. The order in 
which they are resolved is irrelevant. The same is done for adding units. For the latter, 
experiments show that the number of added units is relatively small compared to the 
eliminated variables, hence the penalty of using atomic instructions is almost negligible. 
It would be overkill to use a segmented scan for adding literals or units. 

At line 1 of Alg. 3, phase 1 is executed by the VARIABLESWEEP kernel (given at 
lines 15-27). Every thread scans the clause set of its designated literals x and z (line 17). 
References to these clauses are stored at 74 and Tz. Moreover, register variables t, 6,7 
are created to hold the current type, number of added clauses, and number of added 
literals of x, respectively. If x is pure at line 19, then there are no resolvents to add and 
the clause sets of x and Z are directly marked as DELETED by the routine TOBLIVION. 
Moreover, this routine adds the marked literals atomically to resolved. At line 22, we 
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Algorithm 3: Three-Phase Parallel Variable Elimination 


Input : — global — P, Sa, T,Ua, resolved, type, buckets, added, __constant__ @ 


1 resolved, type, buckets, added + VARIABLESWEEP(Y, Sq, T); 

2 ldStadded + 1, laStidx + 1, lasteref 4 1, lasto + 0; 

3 forj :|p|— 1,j— 1 — 0 do // find index and # resolvents of last eliminated æ 
4 if type[j] A 0 then 

5 | lastiay <— j, laStaddea - added[j]; break; 

6 

F 

8 

9 


buckets + EXCLUSIVESCAN (buckets, SIZE(clauses), stream0); 
added «+ EXCLUSIVESCAN (added, SIZE(references), stream1); 
SYNCALL( ); 
numCls 4— laStaddea + added|{lastiax|; 

10 laSterep 4 Ceferences|numCls — 1], lasto + clauses|lasteref|; 

11 numBuckets < laSteree + (œ + SIZE(lastc ) — 1); 

12 RESIZE(clauses, numBuckets), RESIZE(references, numCls); 

13 Sg, Ug < VARIABLERESOLVENT((, Sa, 7, type, buckets, added); 


15 kernel VARIABLESWEEP (¢, Sq, T): 


16 for alli € [ 0, |p| ] in parallel 

7 register x — ¢[i], Te — T [x], Tz + T [z], t + NONE, 8 + 0, y + 0; 

18 type[i] < 0, buckets|[i] + 0, added[i] + 0; // initially reset 
19 if Ta = Ø V Tz = Ó then // check if « is a pure literal 
20 | resolved + TOBLIVION(#, Sa, Ta, Tz); 

21 else 

2 t, B, Y < GATEREASONING (2, Sa, Tx, Tz, 2); 

23 if t A GATE then 

24 | t, 8, y <— MAYRESOLVE (2, Sa, Tx, Tz) ; // t may set to RESOLUTION 
25 if t 4 O then // x can be eliminated 
26 typeli] + t, added[i] + B,buckets[i] + a x B+ (y — 8); 

27 resolved + TOBLIVION(2, Sa, Tx, Tz); 

28 kernel VARIABLERESOLVENT (9, Sg, 7, type, buckets, added): 

29 for alli € [ 0, |y| ] in parallel 

30 register x <— y[i], Tz — T [2], Tz + T [a]; 

31 register t <— type[i], cref + buckets|i], rpos = added{i]; 

32 if t = RESOLUTION then 

33 | (Sa, Ua) — (Sa, Ua) U RESOLVE(T, Sa, Tx, Tz, rpos, cref); 

34 if t = GATE then 

35 | (Sa, Ua) — (Sa, Ua) U SUBSTITUTE(®, Sa, Tx, Tz, rpos, cref); 


check first if x contributes to a logical gate using the routine GATEREASONING, and 
save the corresponding 8 and y. If this is the case, the type t is set to GATE, otherwise 
we try resolution at line 24. The condition 6 < (|7z| + |Tz|) is tested implicitly by 
MAYRESOLVE to limit the number of resolvents per x. If t is set to a nonzero value 
(line 25), the type and added arrays are updated correspondingly. The total number of 
buckets needed to store all added clauses is calculated by the formula (a x 8 + (y — 8)) 
and stored in bucket s{i] at line 26. After type and added have been completely 
constructed, the loop at lines 3-4 identifies the index of the last variable eliminated 
starting from position |p| — 1. If the condition at line 4 holds, index j and the number of 
underlying resolvents are saved to lastig, and laStaddea, respectively. These values will 
be used later to set the new size of the simplified formula S4 on the host side. 

Phase 2 is now ready to apply EXCLUSIVESCAN on the added and buckets lists. 
Both clauses and references refer to the structural members of S4, as described 
in Fig. 1b. The procedure at line 6 takes the old size of clauses to offset the calcu- 
lated references of the added resolvents. The SIZE routine returns the size of the input 
structure. Similarly, the second call at line 7 takes the old size of references and cal- 
culates the new indices for storing new references. Both scans are executed concurrently 
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Algorithm 4: Parallel Eager Redundancy Elimination for Inprocessing 
Input : global y, Sa, T 


1 kernel ERE (9, Sa, T): 

2 for alli € [ 0, || ]” in parallel 

3 xz + yfi]; 

4 for C € Sq[T [z]] do 

5 for C’ € Sq|T[z]] do 

6 if (Cm <-RESOLVE (x, C,C’)) # Ø then 

7 if state(C) = LEARNT V state(C’) = LEARNT then 
8 | st <— LEARNT 

9 else 

10 | st <— ORIGINAL 

u FORWARDEQUALITY (Cm, Sa, T, st); 

12 device function FORWARDEQUALITY (Cm, Sa, T, St): 

13 minList ~~ FINDMINLIST (T , Cm); 

14 for alli € [| 0, |minList| ]* in parallel 

15 C © Sq{minList(il]; 

16 if C = Cm A (state(C) = LEARNT V state(C) = st) then state(C) + DELETED ; 


via streamO and stream], and are synchronized by the SYNCALL call at line 8. After 
the exclusive scan, the last element in added gives the total number of clauses in S4 
minus the resolvents added by the last eliminated variable. Therefore, adding this value 
to laStaddea gives the total number of clauses in Sg (line 9). At line 10, the last clause 
lastc and its reference last,,.¢ are fetched. At line 11, the number of buckets of lasto 
is added to lastcyer to get the total number of buckets numBuckets. The numBuckets and 
numCls are used to resize clauses and references, respectively, at line 12. 

Finally, in phase 3, we use the calculated indices in added and buckets to guide 
the new resolvents to their locations in Sg. The kernel is described at lines 28-35. Each 
thread either calls the procedure RESOLVE or SUBSTITUTE, based on the type stored 
for the designated variables. Any produced units are saved into Ug atomically. The cref 
and rpos variables indicate where resolvents should be stored in S4 per variable x. 


7 Eager Redundancy Elimination 


Alg. 4 describes a two-dimensional kernel, in which from each thread ID, an x and y 
coordinate is derived. This allows us to use two nested grid-stride loops. In the loops, we 
specify which of the two coordinates should be used to initialise 7 in the first iteration. 

Based on the kernel’s y-dimension ID (line 2), each thread merges where possible 
two clauses of its designated variable x and its complement 7 (lines 3-6), and writes the 
result in shared memory as Cm. This new clause is produced by the routine RESOLVE 
at line 6. At lines 7-10, we check if one of the resolved clauses 1s LEARNT, and if so, the 
state st of Cm is set to LEARNT as well, otherwise it is set to ORIGINAL. This state of 
C'm will guide the FORWARDEQUALITY routine called at line 11 to search for redundant 
clauses of the same type. This routine is a device function, as it can only be called from 
a kernel, and is described at lines 12-17. In this function, the z-dimension of the thread 
ID is used to search the clauses referenced by the minimum occurrence list minList, 
which is produced by FINDMINLIST at line 13. It has the minimum size among the lists 
of all literals in Cm. If a clause Cis found that is equal to Cm and is either LEARNT or 
has a state equal to the one of Cm, it is set to DELETED (lines 16). 
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Fig. 3: Speedup of the proposed VE and GC algorithms on the benchmark suite 


8 Experiments 


We implemented the proposed algorithms in PFROST-GPU? with CUDA C++ version 
11.0 [31]. We evaluated all GPU experiments on an NVIDIA Titan RTX GPU. This 
GPU has 72 SMs (64 cores each), 24 GB global memory and 48 KB shared memory. 
The GPU operates at a base clock of 1.3 GHz (boost: 1.7 GHz). The GPU machine was 
running Linux Mint v20 with an Intel Core 15-7600 CPU of 3.5 GHz base clock speed 
(turbo: 4.1 GHz) and a system memory of 32 GB. 

We selected 493 SAT problems from the 2013-2020 SAT competitions. All formu- 
las larger than 5 MB in size are chosen, excluding redundancies (repeated CNFs across 
competitions). For very small problems, the GPU is not really needed, as only few vari- 
ables and clauses can be removed. The selected problems encode around 70+ different 
real-world applications, with various logical properties. 

In the experiments, besides the implementations of our new GPU algorithms, we in- 
volved a CPU-only version of PARAFROST (PFROST-CPU), and the CADICAL [6] 
SAT solver for the solving of problems, and executed these on the compute nodes of 
the Lisa CPU cluster*. Each problem was analysed in isolation on a separate computing 
node. Each computing node had an Intel Xeon Gold 6130 CPU running at a base clock 
speed of 2.1 (turbo: 3.7) GHz with 96 GB of system memory, and runs on Debian Linux 
operating system. With this information, we adhere to all five principles laid out in the 
SAT manifesto (version 1) [9], noting that we also included problems older than three 
years, to have a sufficient number of large problems to work with. 


SAT-Simplification Speedup. Figure 3 discusses the performance evaluation of the 
GPU Algorithms 1 and 3 compared to their previous implementations in SIGMA [34]. 
For these experiments, we set ys and phases initially to 32 and 5, respectively. Prepro- 
cessing is only enabled to measure the speedup. Fig. 3a shows the speedup of running 
parallel GC against a sequential version on the host. Clearly, for almost all cases, Alg. 1 
achieved a drastic acceleration when executed on the device with a maximum speed 
up of 93x and an average of 48x. Fig. 3b reveals how fast the 3-phase parallel VE is 


> Solvers/formulas are available at https://gears.win.tue.nl/software/parafrost. 
4 This work was carried out on the Dutch national e-infrastructure with the support of SURF 
Cooperative. 
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compared to version using more atomic instructions. On average, the new algorithm is 
twice as fast as the old BVIPE algorithm [34]. In addition, we get reproducible results. 


SAT-Solving. These experiments provide a thorough assessment of our CPU/GPU 
solver, the CPU-only version, and CADICAL on SAT solving with preprocessing + 
inprocessing turned on. The features walksat, vivification and probing [6] are disabled 
in CADICAL as they are not yet supported in PARAFROST. As in PARAFROST, 
all elimination methods in CADICAL are turned on with a bound on the occurrence 
list size set to 30,000. The same parameters for the search heuristics are used for all 
experiments. However, we delay the scheduling of inprocessing in PARAFROST until 
4,000 of the fixed (root) variables are removed. The occurrence limit u is bounded by 
32 in CADICAL. On the other hand, we start with 32 and double this value every new 
phase as shown in Alg. 2. These extensions increase the likelihood of doing more work 
on the GPU. The timeout for all experiments is set to 5,000 seconds. The timeout for 
the sequential solvers has a 6% tolerance (i.e., is 5,300 seconds in total) to compensate 
for the different CPU frequencies of the GPU machine and the cluster nodes. 

Figure 4 demonstrates the runtime results for all solvers over the benchmark suite. 
Subplot (a) shows the total time (simplify + solving) for all formulas. Data are sorted 
w.r.t. the x-axis. The simplify time accounts data transfers in PFROST-GPU. Overall, 
PFROST-GPU dominates over PFROST-CPU and CADICAL. Subplot (b) demon- 
strates the solving impact of PFROST-GPU versus CADICAL on SAT/UNSAT for- 
mulas. PFROST-GPU seems more effective on UNSAT formulas than CADICAL. Col- 
lectively, PFROST-GPU performed faster on 196 instances (58% out of all solved), in 
which 18 formulas were unsolved by CADICAL. 

Subplots (c) and (d) show simplification time and its percentage of the total process- 
ing time, respectively. Clearly, the CPU/GPU solver outperforms its sequential counter- 
part due to the parallel acceleration. Plot (d) tells us that PFROST-GPU keeps the 
workload in the region between 0 and 20% as the elimination methods are scheduled 
on a bulk of mutually independent variables in parallel. In CADICAL, variables and 
clauses are simplified sequentially, which takes more time. Plot (e) shows the effective- 
ness of ERE on formulas with successful clause reductions. The last plot (f) reflects the 
overall efficiency of parallel inprocessing on variables and clauses (learnt clauses are 
included). Data are sorted in descending order. Reductions can remove up to 90% and 
80% of the variables and clauses, respectively. 


9 Related Work 


A simple GC monitor for GPU term rewriting has been proposed by van Eerd et al. [18]. 
The monitor tracks deleted terms and stores their indices in a list. New terms can be 
added at those indices. The authors in [1,26] investigated the challenges for offload- 
ing garbage collectors to an Accelerated Processing Unit (APU). Matthias et al. [39] 
introduced a promising alternative for stream compaction [10] via parallel defragmen- 
tation on GPUs. Our GC, on the other hand, is tailored to SAT solving, which allows 
it to be simple yet efficient. Regarding inprocessing, Jarvisalo et al. [23] introduced 
certain rules to determine how and when inprocessing techniques can be applied. Ac- 
celeration of the DPLL SAT solving algorithm on a GPU has been done in [15], where 
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some parts of the search were performed on a GPU and the remainder is handled by 
the CPU. Incomplete approaches are more amenable to be executed entirely on a GPU, 
e.g., an approach using metaheuristic algorithms [44]. We are the first to work on GPU 
inprocessing in modern CDCL solvers. 


10 Conclusion 


We have shown that GPU-accelerated inprocessing significantly reduces simplification 
time in SAT solving, allowing more problems to be solved. Parallel ERE and VE can be 
performed efficiently on many-core systems, producing impactful reductions on both 
original and learnt clauses in a fraction of a second, even for large problems. The pro- 
posed parallel GC achieves a substantial speedup in compacting SAT formulas on a 
GPU, while stimulating coalesced accessing of clauses. 

Concerning future work, the results suggest to continue taking the capabilities of 
GPU inprocessing further by supporting more simplification techniques. 
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Abstract Form validators based on regular expressions are often used 
on digital forms to prevent users from inserting data in the wrong format. 
However, writing these validators can pose a challenge to some users. 
We present FOREST, a regular expression synthesizer for digital form 
validations. FOREST produces a regular expression that matches the de- 
sired pattern for the input values and a set of conditions over capturing 
groups that ensure the validity of integer values in the input. Our syn- 
thesis procedure is based on enumerative search and uses a Satisfiability 
Modulo Theories (SMT) solver to explore and prune the search space. We 
propose a novel representation for regular expressions synthesis, multi- 
tree, which induces patterns in the examples and uses them to split the 
problem through a divide-and-conquer approach. We also present a new 
SMT encoding to synthesize capture conditions for a given regular ex- 
pression. To increase confidence in the synthesized regular expression, 
we implement user interaction based on distinguishing inputs. 

We evaluated FOREST on real-world form-validation instances using reg- 
ular expressions. Experimental results show that FOREST successfully 
returns the desired regular expression in 70% of the instances and out- 
performs REGEL, a state-of-the-art regular expression synthesizer. 


1 Introduction 


Regular expressions (also known as regexes) are powerful mechanisms for de- 
scribing patterns in text with numerous applications. One notable use of regexes 
is to perform real-time validations on the input fields of digital forms. Regexes 
help filter invalid values, such as typographical mistakes (‘typos’) and format 
inconsistencies. Aside from validating the format of form input strings, regular 
expressions can be coupled with capturing groups. A capturing group is a sub- 
regex within a regex that is indicated with parenthesis and captures the text 


* This work was supported by NSF award CCF-1762363 and through FCT under 
project UIDB/50021/2020, and project ANI 045917 funded by FEDER and FCT. 
© The Author(s) 2021 
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matched by the sub-regex inside them. Capturing groups are used to extract in- 
formation from text and, in the domain of form validation, they can be used to 
enforce conditions over values in the input string. In this paper, we focus on the 
capture of integer values in input strings, and we use the notation $i, i € {0,1,...} 
to refer to the integer value of the text captured by the (i + 1)* group. 

Form validations often rely on complex regexes which require programming 
skills that not all users possess. To help users write regexes, prior work has pro- 
posed to synthesize regular expressions from natural language [1,9,12,27] or from 
positive and negative examples [1,7,10,26]. Even though these techniques assist 
users in writing regexes for search and replace operations, they do not specifi- 
cally target digital form validation and do not take advantage of the structured 
format of the data. 

In this paper, we propose FOREST, a new program synthesizer for regular ex- 
pressions that targets digital form validations. FOREST takes as input a set of ex- 
amples and returns a regex validation. FOREST accepts three types of examples: 
(i) valid examples: correct values for the input field, (ii) invalid examples: 
incorrect values for the input field due to their format, and (iii) conditional 
invalid examples (optional): incorrect values for the input field due to their 
values. FOREST outputs a regex validation, consisting of two components: (i) a 
regular expression that matches all valid and none of the invalid examples 
and (ii) capture conditions that express integer conditions that are satisfied 
by the values on all the valid but none of the conditional invalid examples. 


Motivating Example. Suppose a user is writing a form where one of the fields 
is a date that must respect the format DD/MM/YYYY. The user wants to accept: 


19/08/1996 22/09/2000 29/09/2003 

26/10/1998 01/12/2001 31/08/2015 
But not: 

19/08/96 22.09.2000 29/9/2003 

26-10-1998 1/12/2001 2015/08/31 


A regular expression can be used to enforce this format. Instead of writing it, the 
user may simply use the two sets of values as valid and invalid input examples 
to FOREST, who will output the regex [0-9]{2}/ [0-9] {2}/ [0-9] {4}. 

Additionally, if the user wants to validate not only the format, but also the 
values in the date, we can consider as conditional invalid the examples: 


33/08/1996 22/13/2000 12/31/2003 
26/00/1998 00/12/2001 52/03/2015 


FOREST will output a regex validation complete with conditions over captur- 


ing groups that ensures only valid values are inserted as the day and month: 
( [0-9] {2}) /([0-9]{2})/[0-91{4}, $0 < 31 A $0 >1A$1< 12A$1>1. 


As we can see in the motivating example, data inserted into digital forms is 
usually structured and shares a common pattern among the valid examples. In 
this example, the data has the shape dd/dd/dddd where d represents a digit. This 
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contrasts with general regexes for search and replace operations that are often 
performed over unstructured text. FOREST takes advantage of this structure by 
automatically detecting these patterns and using a divide-and-conquer approach 
to split the expression into simpler sub-expressions, solving them independently, 
and then merging their information to obtain the final regular expression. Addi- 
tionally, FOREST computes a set of capturing groups over the regular expression, 
which it then uses to synthesize integer conditions that further constrain the ac- 
cepted values for that form field. 

Input-output examples do not require specialized knowledge and are accessi- 
ble to users. However, there is one downside to using examples as a specification: 
they are ambiguous. There can be solutions that, despite matching the exam- 
ples, do not produce the desired behavior in situations not covered in them. 
The ambiguity of input-output examples raises the necessity of selecting one 
among multiple candidate solutions. To this end, we incorporate a user interac- 
tion model based on distinguishing inputs for both the synthesis of the regular 
expressions and the synthesis of the capture conditions. 

In summary, this paper makes the following contributions: 


— We propose a multi-tree SMT representation for regular expressions that 
leverages the structure of the input to apply a divide-and-conquer approach. 

— We propose a new method to synthesize capturing groups for a given regular 
expression and integer conditions over the resulting captures. 

— We implemented a tool, FOREST, that interacts with the user to disam- 
biguate the provided specification. FOREST is evaluated on real-world in- 
stances and its performance is compared with a state-of-the-art synthesizer. 


2 Synthesis Algorithm Overview 


The task of automatically generating a program that satisfies some desired be- 
havior expressed as a high-level specification is known as Program Synthesis. 
Programming by Example (PBE) is a branch of Program Synthesis where the 
desired behavior is specified using input-output examples. 

Our synthesis procedure is split into two stages, each relative to an output 
component. First, FOREST synthesizes the regular expression, which is the basis 
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Figure 2: Interactive enumerative search 


for the synthesis of capturing groups. Secondly, FOREST synthesizes the capture 
conditions, by first computing a set of capturing groups and then the conditions 
to be applied to the resulting captures. The synthesis stages are detailed in sec- 
tions 3 and 4. Figure 1 shows the regex validation synthesis pipeline. Both stages 
of our synthesis algorithm employ enumerative search, a common approach to 
solve the problem of program synthesis [4,5,10,17,21]. The enumerative search 
cycle is depicted in Figure 2. 

There are two key components for program enumeration: the enumerator 
and the verifier. The enumerator successively enumerates programs from the 
a predefined Domain Specific Language (DSL). Following the Occam’s razor 
principle, programs are enumerated in increasing order of complexity. The DSL 
defines the set of operators that can be used to build the desired program. 
FOREST dynamically constructs its DSL to fit the problem at hand: it is as 
restricted as possible, without losing the necessary expressiveness. The regular 
expression DSL construction procedure is detailed in section 3.1. 

For each enumerated program, the verifier subsequently checks whether it 
satisfies the provided examples. Program synthesis applications generate very 
large search spaces; nevertheless, the search space can be significantly reduced by 
pruning several infeasible expressions along with each incorrect expression found. 
In the first stage of the regex validation synthesis, the enumerated programs 
are regular expressions. The enumeration and pruning of regular expressions is 
described in section 3.2. In the second stage of regex validation synthesis, we deal 
with the enumeration of capturing groups over a pre-existing regular expression. 
This process is described in section 4.1. 

To circumvent the ambiguity of input-output examples, FOREST implements 
an interaction model. A new component, the distinguisher, ascertains, for any two 
given programs, whether they are equivalent. When FOREST finds two different 
validations that satisfy all examples, it creates a distinguishing input: a new 
input that has a different output for each validation. To disambiguate between 
two programs, FOREST shows the new input to the user, who classifies it as valid 
or invalid, effectively choosing one program over the other. The new input-output 
pair is added to the examples, and the enumeration process continues until there 
is only one solution left. This interactive cycle is described for the synthesis of 
regular expressions in section 3.3 and capture conditions in section 4.3. 
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concat 


Figure 3: [0-9] {2}/ [0-9] {2}/ [0-9] {4} represented as a k-tree with k = 2 


3 Regular Expressions Synthesis 


In this section we describe the enumerative synthesis procedure that generates 
a regular expression that matches all valid examples and none of the invalid. 


3.1 Regular Expressions DSL 


Before the synthesis procedure starts, we define which operators can be used 
to build the desired regular expression and the values each operator can take 
as argument. FOREST’s regular expression DSL includes the regex union and 
concatenation operators, as well as several regular expression quantifiers: 


— Kleene closure: r* matches r zero or more times, 

positive closure: r+ matches r one or more times, 

— option: r? matches r zero or one times, 

— ranges: r{m} matches r exactly m times, and r{m,n} matches r at least m 
times and at most n times. 


The possible values for the range operators are limited depending on the valid 
examples provided by the user. For the single-valued range operator, r{m}, we 
consider only the integer values such that 2 < m < l, where l is the length of 
the longest valid example string. In the two-valued range operator, r{m,n}, the 
values of m and n are limited to integers such that 0 < m < n < l. The tuple 
(0,1) is not considered, since it is equivalent to the option quantifier: r{0, 1} = r?. 

All operators can be applied to regex literals or composed with each other 
to form more complex expressions. The regex literals considered in the syn- 
thesis procedure include the individual letters, digits or symbols present in the 
examples and all character classes that contain them. The character classes con- 
templated in the DSL are [0-9], [A-Z], [a-z] and all combinations of those, 
such as [A-Za-z] or [0-9A-Za-z]. Additionally, [0-9A-F] and [0-9a-f] are 
used to represent hexadecimal numbers. 


3.2 Regex Enumeration 


To enumerate regexes, the synthesizer requires a structure capable of represent- 
ing every feasible expression. We use a tree-based representation of the search 
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Figure 4: [0-9] {2}/ [0-9] {2}/ [0-9] {4} represented as a multi-tree with n = 5 
and k = 2, resulting from the concatenation of 5 simpler regexes 


space. A k-tree of depth d is a tree in which every internal node has exactly 
k children and every leaf node is at depth d. A program corresponds to an as- 
signment of a DSL construct to each tree node, the node’s descendants are the 
construct’s arguments. If k is the greatest arity among all DSL constructs, then 
a k-tree of depth d can represent all programs of depth up to d in that DSL. 
The arity of constructs in FOREST’s regex DSLs is at most 2, so all regexes in 
the search space can be represented using 2-trees. To allow constructs with arity 
smaller than k, some children nodes are assigned the empty symbol, e. In Fig- 
ure 3, the regex from the motivating example, [0-9] {2}/ [0-9] {2}/ [0-9] {4}, 
is represented as a 2-tree of depth 5. 

To explore the search space in order of increasing complexity, we enumerate 
k-trees of lower depths first and progressively increase the depth of the trees 
as previous depths are exhausted. The enumerator encodes the k-tree as an 
SMT formula that ensures the program is well-typed. A model that satisfies the 
formula represents a valid regex. Due to space constraints we omit the k-tree 
encoding but further details can be found in the literature [2,17]. 


Multi-tree representation. We considered several validators for digital forms 
and observed that many regexes in this domain are the concatenation of rela- 
tively simple regexes. However, the successive concatenation of simple regexes 
quickly becomes complex in its k-tree representation. Recall the regex for date 
validation presented in the motivating example: [0-9] {2}/ [0-9] {2}/ [0-9] {4}. 
Even though this is the concatenation of 5 simple sub-expressions, each of depth 
at most 2, its representation as a k-tree has depth 5, as shown in Figure 3. 

The main idea behind the multi-tree constructs is to allow the number of 
concatenated sub-expressions to grow without it reflecting exponentially on the 
encoding. The multi-tree structure consists of n k-trees, whose roots are con- 
nected by an artificial root node, interpreted as an n-ary concatenation opera- 
tor. This way, we are able to represent regexes using fewer nodes. Figure 4 is 
the multi-tree representation of the same regex as Figure 3, and shows that the 
multi-tree construct can represent this expression using half the nodes. 

The k-tree enumerator successively explores k-trees of increasing depth. How- 
ever, multi-tree has two measures of complexity: the depth of the trees, d, and 
the number of trees, n. FOREST employs two different methods for increasing 
these values: static multi-tree and dynamic multi-tree. 
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Static multi-tree. In the static multi-tree method, the synthesizer fixes n 
and progressively increases d. To find the value of n, there is a preprocessing 
step, in which FOREST identifies patterns in the valid examples. This is done by 
first identifying substrings common to all examples. A substring is considered a 
dividing substring if it occurs exactly the same number of times and in the same 
order in all examples. Then, we split each example before and after the dividing 
substrings. Each example becomes an array of n strings. 


Example 1. Consider the valid examples from the motivating example. In these 
examples, ‘/’ is a dividing substring because it occurs in every example, and 
exactly twice in each one. ‘0’ is a common substring but not a dividing substring 
because it does not occur the same number or times in all examples. After 
splitting on ‘/’, each example becomes a tuple of 5 strings: 


(‘19’, var ‘08’, as ‘1996’) Cor a ‘12, pa ‘2001’) 
(26°, */7, ‘10, */, 11998”) (29°, P00" */*, 2003") 
(22°, Par ‘09’, ues ‘2000’) (31, ae ‘08’, ep, ‘2015’) 


Then, we apply the multi-tree method with n trees. For every i € {1,...,n}, 
the i'” sub-tree represents a regex that matches all strings in the i” position 
of the split example tuples and the concatenation of the n regexes will match 
the original example strings. Since each tree is only synthesizing a part of the 
original input strings, a reduced DSL is recomputed for each tree. 


Dynamic multi-tree. The dynamic multi-tree method is employed when the 
examples cannot be split because there are no dividing substrings. In this sce- 
nario, the enumerator will still use a multi-tree construct to represent the regex. 
However, the number of trees is not fixed and all trees use the original, complete 
DSL. A multi-tree structure with n k-trees of depth d has n x (k? — 1) nodes. 
FOREST enumerates trees with different values of (n,d) in increasing order of 
number of nodes, starting with n = 1 and d = 2, a simple k-tree of depth 2. 


Pruning. We prune regexes which are provably equivalent to others in the 
search space by using algebraic rules of regular expressions like the following: 


(rx)* = rx (r?)? =r? (r+)+=r+ 

+= rx (r?)*x = (rx)? = rx (r?)+ = (r+)? = rx 

= (r{m})}* (r+ {mn} = (r{m})+ Om) = (rim)? 
r{n}{m} = r{m}{n} = r{m x n} 


To prevent the enumeration of equivalent regular expressions, we add SMT 
constraints that block all but one possible representation of each regex. Take, 
for example, the equivalence (r?)+ = rx. We want to consider only one way to 
represent this regex, so we add a constraint to block the construction (r?)+ for 
any regex r. Another such equivalence results from the idempotence of union: 


eons, 
3 
chy 
a 
3 
=- 
III 
A See 
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r|r = r. To prevent the enumeration of expressions of the type r|r, every time 
the union operator is assigned to a node i, we force the sub-tree underneath 
it’s left child to be different from the sub-tree underneath 7’s right child by at 
least one node. When we enumerate a regex that is not consistent with the 
examples, it is eliminated from the search space. Along with the incorrect regex, 
we want to eliminate regexes that are equivalent to it. The union operator in 
the regular expressions DSL is commutative: r|s = s|r, for any regexes r and 
s. Thus, whenever an expression containing r|s is discarded, we eliminate the 
expression that contains s|r in its place as well. 


3.3 Regex Disambiguation 


To increase confidence in the synthesizer’s solution, FOREST disambiguates the 
specification by interacting with the user. We employ an interaction model based 
on distinguishing inputs, which has been successfully used in several synthesizers 
[11,24,25,14]. To produce a distinguishing input, we require an SMT solver with 
a regex theory, such as Z3 [15,23]. Upon finding two regexes that satisfy the 
user-provided examples, rı and rz, we use the SMT solver to solve the formula: 


ds:ri(s) Æ r2(s), (1) 


where 71(s) (resp. r2(s)) is True if and only if rı (resp. r2) matches the string s. 
A string s that satisfies (1) is a distinguishing input. FOREST asks the user to 
classify this input as valid or invalid, and s is added to the respective set of 
examples, thus eliminating either rı or rg from the search space. After the first 
interaction, the synthesis procedure continues only until the end of the current 
depth and number of trees. 


4 Capturing Groups Synthesis 


In this section we describe the synthesis procedure of the second component 
of a regex validation: a set of integer conditions over captured values that are 
satisfied by all valid examples but none of the conditional invalid examples. 


4.1 Capturing Groups Enumeration 


To enumerate capturing groups, FOREST starts by identifying the regular expres- 
sion’s atomic sub-regexes: the smallest sub-regexes whose concatenation results 
in the original complete regex. For example, [0-9] {2} is an atomic sub-regex: 
there are no smaller sub-regexes whose concatenation results in it. It does not 
make sense to place a capturing group inside atomic sub-regexes: ([0-9]) {2} 
does not have a clear meaning. Once identified, the atomic sub-regexes are placed 
in an ordered list. Enumerating capturing groups over the regular expression is 
done by enumerating non-empty disjoint sub-lists of this list. The elements inside 
each sub-list form a capturing group. 
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Example 2. Recall the date regex: [0-9] {2}/ [0-9] {2}/ [0-9] {4}. The respec- 
tive list of atomic sub-regexes is [[0-9]{2}, /, [0-9]{2}, /, [0-9]{4}]. The 
following are examples of sub-lists of the atomic sub-regexes list and their re- 
sulting capturing groups: 


[[Lo-9]{2}], /, [o-9]{2}, /, [o-9]{4}] > ([0-9]{2}) / [0-9] {2}/ [0-9] {4} 
[[[o-91{2}], /, [[0-91{23], /, [[0-91{4}]] > ([0-91{2}) /( [0-9] {2}) /( [0-91 {4}) 


4.2 Capture Conditions Synthesis 


To compute capture conditions, we need all conditional invalid examples to be 
matched by the regular expression. After, capturing groups are enumerated as 
described in section 4.1. The number of necessary capturing groups is not known 
beforehand, so we enumerate capturing groups in increasing number. 

A capture condition is a 3-tuple: it contains the captured text, an integer com- 
parison operator and an integer argument. FOREST considers only two integer 
comparison operators, < and >. However, the algorithm can be easily expanded 
to include other operators. Let C be a set of capturing groups and C(x) the in- 
teger captures that result from applying C to example string x. Let De be the 
set of all possible capture conditions over capturing groups C. De results from 
combining each capturing group with each integer operator. Finally, let V be 
the set of all valid examples, Z the set of all conditional invalid examples, and 
X = VUT the union of these two sets. 

Given capturing groups C, FOREST uses Maximum Satisfiability Modulo The- 
ories (MaxSMT) to select from De the minimum set of conditions that are sat- 
isfied by all valid examples and none of the conditional invalid. To encode the 
problem, we define two sets of Boolean variables. First, we define sap, for every 
cap € C(x) and £ E X. Scap,, = True if capture cap in example x satisfies all 
used conditions that refer to it. We also define Ucong for all cond € De. Ucond = 
True means condition cond is used in the solution. Additionally, we define a set 
of integer variables b.onq, for all conditions cond € De that represent the integer 
argument present in each condition. 

Let SMT (cond, x) be the SMT representation of condition cond for example 
x: the capture is an integer value, and the integer argument is the corresponding 
beona variable. Let Deap C De be the set of capture conditions that refer to 
capture cap. Constraint (2) states that a capture cap in example x satisfies all 
conditions if and only if for every condition that refers to cap either it is not used 
in the solution or it is satisfied for the value of that capture in that example: 


Scap,x a A Ucond — SMT (cond, x). (2) 


condeD cap 


Example 3. Recall the first valid string from the motivating example: xo = 
“19/08/1996”. Suppose FOREST has already synthesized the desired regular ex- 
pression and enumerated a capturing group that corresponds to the day: 

( [0-9] {2}) / [0-9] {2}/ [0-9] {4}. Let condo and cond, be the conditions that 
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refer to the first (and only) capturing group, $0, and operators < and > respec- 
tively. The SMT representation for condo and xo is SMT(condo, xo) = 19 < 
beondy: Constraint (2) is: 


50,29 <2 (Ucondy > 19 < Deondo ) A (icona >19 > beondi): 


Then, we ensure the used conditions are satisfied by all valid examples and 
none of the conditional invalid examples: 


A VAN Scap,x TAN A V WSeap,x+ (3) 


zEV capEC(x) xEL capE€C(a) 


Since we are looking for the minimum set of capture conditions, we add soft 
clauses to penalize the usage of capture conditions in the solution: 


VAN T Ucond: (4) 


condE De 


We consider part of the solution only the capture conditions whose Ucond 
is True in the resulting SMT model. We also extract the values of the integer 
arguments in each condition from the model values of the beonag variables. 


4.3 Capture Conditions Disambiguation 


To ensure the solution meets the user’s intent, FOREST disambiguates the spec- 
ification using, once again, a procedure based on distinguishing inputs. Once 
FOREST finds two different sets of capture conditions S; and S> that satisfy the 
specification, we look for a distinguishing input: a string c which satisfies all 
capture conditions in S1, but not those in S2, or vice-versa. First, to simplify 
the problem, FOREST eliminates from Sı and Sə conditions which are present 
in both: these are not relevant to compute a distinguishing input. Let Sf (resp. 
Sž) be the subset of Sı (resp. S2) containing only the distinguishing conditions, 
i.e., the conditions that differ from those in Sz (resp. S1). 

We do not compute the distinguishing string c directly. Instead, we com- 
pute the integer value of the distinguishing captures in c, i.e., the captures that 
result from applying the regular expression and its capturing groups to the dis- 
tinguishing input string. We define |C| integer variables, c;, which correspond to 
the values of the distinguishing captures: co, €1, ...,€\c) = C(c). 

As before, let SMT(cond,c) be the SMT representation of each condition 
cond. Each capture in C(c) is represented by its respective c;, the operator main- 
tains it usual semantics and the integer argument is its value in the solution to 
which the condition belongs. Constraint (5) states that c satisfies the conditions 
in one solution but not the other. 


A SMT(cond,c) # A SMT (cond, c). (5) 


cond € Sf cond € S3 
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In the end, to produce the distinguishing string c, FOREST picks an example 
from the valid set, applies the regular expression with the capturing groups to 
it, and replaces its captures with the model values for c;. 

FOREST asks the user to classify c as valid or invalid. Depending on the 
user’s answer, c is added as a valid or conditional invalid example, effectively 
eliminating either Sı or S2 from the search space. 


Example 4. Recall the examples from the motivating example. No example in- 
validates a date with the day 32, so FOREST will find two correct sets of cap- 
ture conditions over the regular expression ( [0-9] {2})/( [0-9] {2}) / [0-9] {4}: 
Sı = {$0 < 31,$0 > 1,$1 < 12,$1 > 1}, and S2 = {$0 < 32,0 > 1,$1 < 
12,$1 > 1}. First, we define two sets containing only the distinguishing cap- 
tures: Sf = {$0 < 31} and Sž = {$0 < 32}. Then, to find co, the value of the 
distinguishing capture for these solutions, we solve the constraint: 


deo: co < 31 Æ co < 32 


and get the value co = 32 which satisfies Sž (and S2), but not Sf (or S;). 

If we pick the first valid example, “19/08/1996” as basis for c, the respective 
distinguishing input is c = “32/08/1996”. Once the user classifies c as invalid, c 
is added as a conditional invalid example and S2 is removed from consideration. 


5 Related Work 


Program synthesis has been successfully used in many domains such as string 
processing [8,19,7,26], query synthesis [11,25,17], data wrangling [2,5], and func- 
tional synthesis [3,6]. In this section, we discuss prior work on the synthesis of 
regular expressions [10,1] that is most closely related to our approach. 
Previous approaches that perform general string processing [7,26] restrict the 
form of the regular expressions that can be synthesized. In contrast, we support 
a wide range of regular expressions operators, including the Kleene closure, pos- 
itive closure, option, and range. More recent work that targets the synthesis of 
regexes is done by ALPHAREGEX [10] and REGEL |1]. ALPHAREGEX performs 
an enumerative search and uses under- and over-approximations of regexes to 
prune the search space. However, ALPHAREGEX is limited to the binary alpha- 
bet and does not support the kind of regexes that we need to synthesize for 
form validations. REGEL [1] is a state-of-the-art synthesizer of regular expres- 
sions based on a multi-modal approach that combines input-output examples 
with a natural language description of user intent. They use natural language 
to build hierarchical sketches that capture the high-level structure of the regex 
to be synthesized. In addition, they prune the search space by using under- and 
over-approximations and symbolic regexes combined with SMT-based reasoning. 
REGEL’s evaluation [1] has shown that their PBE engine is an order of magni- 
tude faster than ALPHAREGEX. While REGEL targets more general regexes that 
are suitable for search and replace operations, we target regexes for form vali- 
dation which usually have more structure. In our approach, we take advantage 
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of this structure to split the problem into independent subproblems. This can 
be seen as a special case of sketching [22] where each hole is independent. Our 
pruning techniques are orthogonal to the ones used by REGEL and are based on 
removing equivalent regexes prior to the search and to remove equivalent failed 
regexes during search. To the best of our knowledge, no previous work focused 
on the synthesis of conditions over capturing groups. 

Instead of using input-output examples, there are other approaches that syn- 
thesize regexes solely from natural language [9,12,27]. We see these approaches as 
orthogonal to ours and expect that FOREST can be improved by hints provided 
by a natural language component such as was done in REGEL. 


6 Experimental Results 


Implementation. FOREST is open-source and publicly available at https: //github. 
com/Marghrid/FOREST. FOREST is implemented in Python 3.8 on top of TRIN- 
ITY, a general-purpose synthesis framework [13]. All SMT formulas are solved 
using the Z3 SMT solver, version 4.8.9 [15]. To find distinguishing inputs in reg- 
ular expression synthesis, FOREST uses Z3’s theory of regular expressions [23]. 
To check the enumerated regexes against the examples, we use Python’s regex li- 
brary [18]. The results presented herein were obtained using an Intel(R) Xeon(R) 
Silver 4110 CPU @ 2.10GHz, with 64GB of RAM, running Debian GNU/Linux 10. 


All processes were run with a time limit of one hour. 


Benchmarks. To evaluate FOREST, we used 64 benchmarks based on real-world 
form-validation regular expressions. These were collected from regular expres- 
sion validators in validation frameworks and from regexlib [20], where users 
can upload their own regexes. Among these 64 benchmarks there are different 
formats: national IDs, identifiers of products, date and time, vehicle registration 
numbers, postal codes, email and phone numbers. For each benchmark, we gen- 
erated a set of string examples. All 64 benchmarks require a regular expression 
to validate the examples, but only 7 require capture conditions. On average, 
each instance is composed of 13.2 valid examples (ranging from 4 to 33) and 9.3 
invalid (ranging from 2 to 38). The 7 instances that target capture conditions 
have on average 6.3 conditional invalid examples (ranging from 4 to 8). 


The goal of this experimental evaluation is to answer the following questions: 

Q1: How does FOREST compare against REGEL? (section 6.1) 

Q2: How does pruning affect multi-tree’s time performance? (section 6.2) 

Q3: How does static multi-tree improve on dynamic multi-tree? (section 6.2) 

Q4: How does multi-tree compare against other encodings? (section 6.3) 

Q5: How many examples are required to return a correct solution? (section 6.4) 
FOREST, by default, uses static multi-tree (when possible) with pruning. It 

correctly solves 31 benchmarks (48%) in under 10 seconds. In one hour, FOREST 

solves 47 benchmarks (73%), with 96% accuracy: only two solutions did not 

correspond to the desired regex validation. FOREST disambiguates only among 

programs at the same depth, and so if the first solution is not at the same depth 
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Table 1: Comparison of time performance using different synthesis methods 


Timeout (s) 10 60 3600 
FOREST (with interaction) 31 39 47 
FOREST’s 1% regex (no interaction) 40 46 50 
Multi-tree w/o pruning 20 32 (38 
Dynamic-only multi-tree 5 10 18 
k-tree 4 °° 9 15 
Line-based (w/o pruning) 4 4 12 
Reser o 29 38 4&7 
REGEL PBE 5 iG 23 
3,600 
3,000 — Line-based 
—— k-tree 
_~ 2,400 — Dynamic multi-tree 
=Z == REGEL PBE 
2 1,800 i : 
g —e— Multi-tree w/o pruning 
3 1,200 —— REGEL 
—— FOREST 


600 —— FOREST’s 1** regex 


0 
8 16 24 32 40 48 56 64 


Instances solved 


Figure 5: Instances solved using different methods 


as the correct one, the correct solution is never found. After 1 hour of running 
time, FOREST is interrupted, but it prints its current best validation before 
terminating. After the timeout, FOREST returned 3 more regexes, 2 of which the 
correct solution for the benchmark. In all benchmarks to which FOREST returns 
a solution, the first matching regular expression is found in under 10 minutes. In 
40 benchmarks, the first regex is found in under 10 seconds. The rest of the time 
is spent disambiguating the input examples. FOREST interacts with the user to 
disambiguate the examples in 27 benchmarks. Overall, it asks 1.8 questions and 
spends 38.6 seconds computing distinguishing inputs, on average. 

Regarding the synthesis of capture conditions, in 5 of the benchmarks, we 
need only 2 capturing groups and at most 4 conditions. In these instances, the 
conditions’ synthesis takes under 2 seconds. The remaining 2 benchmarks need 4 
capturing groups and take longer: 99 seconds to synthesize 4 conditions and 1068 
seconds for 6 conditions. During capture conditions synthesis, FOREST interacts 
7.14 times and takes 0.1 seconds to compute distinguishing inputs, on average. 

Table 1 shows the number of instances solved in under 10, 60 and 3600 
seconds using FOREST, as well as using the different variations of the synthesizer 
which will be described in the following sections. The cactus plot in Figure 5 
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shows the cumulative synthesis time on the y-axis plotted against the number of 
benchmarks solved by each variation of FOREST (on the x-axis). The synthesis 
methods that correspond to lines more to the right of the plot are able to solve 
more benchmarks in less time. We also compare solving times with REGEL [I]. 
REGEL takes as input examples and a natural description of user intent. We 
consider not only the complete REGEL synthesizer, but also the PBE engine of 
REGEL by itself, which we denote by REGEL PBE. 


6.1 Comparison with REGEL 


As mentioned in section 5, REGEL’s synthesis procedure is split into two steps: 
sketch generation (using a natural language description of desired behavior) and 
sketch completion (using input-output examples). To compare REGEL and FOR- 
EST, we extended our 64 form validation benchmarks with a natural language 
description. To assess the importance of the natural language description, we 
also ran REGEL using only its PBE engine. Sketch generation took on average 
60 seconds per instance, and successfully generated a sketch for 63 instances. 
The remaining instance was run without a sketch. We considered only the high- 
est ranked sketch for each instance. In Table 1 we show how many instances can 
be solved with different time limits for sketch completion; note that these values 
do not include the sketch generation time. REGEL returned a regular expression 
for 47 instances within the time limit. Since REGEL does not implement a dis- 
ambiguation procedure, the returned regular expression does not always exhibit 
the desired behavior, even though it correctly classifies all examples. Of the 47 
synthesized expressions, 31 exhibit the desired intent. This is a 66% accuracy, 
which is the same as FOREST without disambiguation (FOREST’s 1% regex) but 
it is much lower than FOREST with disambiguation at 96%. We also observe that 
REGEL’s performance is severely impaired when using only its PBE engine. 

51 out of the 63 generated sketches are of the form O{ S1, ..., Sn}, where each 
S; is a concrete sub-regex, i.e., has no holes. This construct indicates the desired 
regex must contain at least one of S1, ..., Sn, and contains no information about 
the top-level operators that are used to connect them. 22 of the 47 synthesized 
regexes are based on sketches of that form, and they result from the direct 
concatenation of all components in the sketch. No new components are generated 
during sketch completion. Thus, most of REGEL’s sketches could be integrated 
into FOREST, whose multi-tree structure holds precisely those top-level operators 
that were missing from REGEL’s sketches. 


6.2 Impact of pruning the search space and splitting examples 


To evaluate the impact of pruning the search space as described in section 3.2, we 
ran FOREST with all pruning techniques disabled. In the scatter plot in Figure 6a, 
we can compare the solving time on each benchmark with and without pruning. 
Each mark in the plot represents an instance. The value on the y-axis shows 
the synthesis time of multi-tree with pruning disabled and the value on the x- 
axis the synthesis time with pruning enabled. The marks above the y = v line 
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Figure 6: Comparison of synthesis time using different variations of FOREST. 


(also represented in the plot) represent problems that took longer to synthesize 
without pruning than with pruning. On average, with pruning, FOREST can 
synthesize regexes in 42% of the time and enumerates about 15% of the regexes 
before returning. There is no significant change in the number of interactions 
before returning the desired solution. 

FOREST is able to split the examples and use static multi-tree as described in 
section 3.2 in 52 benchmarks (81%). The remaining 12 are solved using dynamic 
multi-tree. To assess the impact of using static multi-tree we ran FOREST with a 
version of the multi-tree enumerator that does not split the examples, and jumps 
directly to dynamic multi-tree solving. In the scatter plot in Figure 6b, we com- 
pare the solving times of each benchmark. Using static multi-tree when possible, 
FOREST requires, on average, less than two thirds of the time (59.1%) to return 
the desired regex for benchmarks solved by both methods. Furthermore, with 
static multi-tree FOREST can synthesize more complex regexes: the maximum 
number of nodes in a solution returned by dynamic multi-tree is 12 (avg. 6.7), 
while complete multi-tree synthesizes regexes of up to 24 nodes (avg. 10.3). 


6.3 Multi-tree versus k-tree and line-based encodings 


To evaluate the performance of multi-tree enumeration, we ran FOREST with two 
other enumeration encodings: k-tree and line-based. The latter is a state of the 
art encoding for the synthesis of SQL queries |17]. k-tree is the default enumera- 
tor in TRINITY [13], and the line-based enumerator is available in SQUARES [16]. 
The k-tree encoding has a very similar structure to that of multi-tree, so our 
pruning techniques were easily applied to this encoding. On the other hand, 
line-based encoding is intrinsically different, so the pruning techniques were not 
implemented. We compare the line-based encoding to multi-tree without prun- 
ing. In every other aspect, the three encodings were run in the same conditions, 
using FOREST’s regex DSL. k-tree is able to synthesize programs with up to 
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10 nodes, while the line-based encoding synthesizes programs of up to 9 nodes. 
Neither encoding outperforms multi-tree. 

As seen in Table 1, line-based encoding does not outperform the tree-based 
encodings for the domain of regexes while it was much better for the domain of 
SQL queries [17]. We conjecture this disparity arises from the different nature 
of DSLs. Most SQL queries, when represented as a tree, leave many branches of 
the tree unused, which results in a much larger tree and SMT encoding. 


6.4 Impact of fewer examples 


To assess the impact of providing fewer examples on the accuracy of the solution, 
we ran FOREST with modified versions of each benchmark. First, each benchmark 
was run with at most 10 valid and 10 invalid examples, chosen randomly among 
all examples. Conditional invalid examples are already very few per instance, so 
these were not altered. The accuracy of the returned regexes is slightly lower. 
With only 10 valid and 10 invalid examples, FOREST returns the correct regex 
in 93.5% of the benchmarks, which represents a decrease of only 2.5% relative 
to the results with all examples. We also saw an increase in the number of inter- 
actions before returning, since fewer examples are likely to be more ambiguous. 
With only 10 examples, FOREST interacts on average 2.2 times per benchmark, 
which represents an increase of about a fifth. The increase in the number of 
interactions reflects on a small increase in the synthesis time (less than 1%). 
After, we reduced the number of examples even further: only 5 valid and 5 
invalid. The accuracy of FOREST in this setting was reduced to 71%. On average, 
it interacted 4.3 times per benchmark, which is over two times more than before. 


7 Conclusions and Future Work 


Regexes are commonly used to enforce patterns and validate the input fields of 
digital forms. However, writing regex validations requires specialized knowledge 
that not all users possess. We have presented a new algorithm for synthesis of 
regex validations from examples that leverages the common structure shared 
between valid examples. Our experimental evaluation shows that the multi-tree 
representation synthesizes three times more regexes than previous representa- 
tions in the same amount of time and, together with the user interaction model, 
FOREST solves 70% of the benchmarks with the correct user intent. We verified 
that FOREST maintains a very high accuracy with as few as 10 examples of each 
kind. We also observed that our approach outperforms REGEL, a state-of-the-art 
synthesizer, in the domain of form validations. 

As future work, we would like to explore the synthesis of more complex 
capture conditions, such as conditions depending on more than one capture. 
This would allow more restrictive validations; for example, in a date, the possible 
values for the day could depend on the month. Another possible extension to 
FOREST is to automatically separate invalid from conditional invalid examples, 
making this distinction imperceptible to the user. 
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Abstract. Parametric Markov chains (pMCs) are Markov chains with 
symbolic (aka: parametric) transition probabilities. They are a convenient 
operational model to treat robustness against uncertainties. A typical 
objective is to find the parameter values that maximize the reachability 
of some target states. In this paper, we consider automatically proving 
robustness, that is, an e-close upper bound on the maximal reachability 
probability. The result of our procedure actually provides an almost- 
optimal parameter valuation along with this upper bound. 

We propose to tackle these ETR-hard problems by a tight combination 
of two significantly different techniques: monotonicity checking and pa- 
rameter lifting. The former builds a partial order on states to check 
whether a pMC is (local or global) monotonic in a certain parameter, 
whereas parameter lifting is an abstraction technique based on the itera- 
tive evaluation of pMCs without parameter dependencies. We explain our 
novel algorithmic approach and experimentally show that we significantly 
improve the time to determine almost-optimal synthesis. 


1 Introduction 


Background and problem setting. Probabilistic model checking is a well- 
established field and has various applications but assumes probabilities to be 
fixed constants. To deal with uncertainties, symbolic parameters are used. Para- 
metric Markov chains (pMCs, for short) define a family of Markov chains with 
uncountably many family members, called instantiations, by having symbolic 
(aka: parametric) transition probabilities (10/22). We are interested in determining 
optimal parameter settings: which instantiation meets a given objective the best? 
The typical objective is to maximize the reachability probability of a set of target 
states. This question is inspired by practical applications such as: what are the 
optimal parameter settings in randomised controllers to minimise power consump- 
tion?, and what is the optimal bias of coins in a randomised distributed algorithm 
to maximise the chance of achieving mutual exclusion? For most applications, 
it suffices to achieve parameters that attain a given quality of service that is 
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e-close to the unknown optimal solution. More precisely, this paper concentrates 
on automatically proving ¢-robustness, i.e., determine an upper bound which is 
e-close to the maximal reachability probability. The by-product of our procedure 
actually provides an almost-optimal parameter valuation too. 


Existing parameter synthesis techniques. Efficient techniques have been developed 
in recent years for the feasibility problem: given a parametric Markov chain, and 
a reachability objective, find an instantiation that reaches the target with at 
least a given probability. To solve this problem, it suffices to “guess” a correct 
family member, i.e., a correct parameter instantiation. Verifying the “guessed” 
instantiation against the reachability objective is readily done using off-the- 
shelf Markov chain model-checking algorithms. Most recent progress is based on 
advanced techniques that make informed guesses: This ranges from using sampling 
techniques [14], guided sampling such as particle swarm optimisation [7], by greedy 
search [24], or by solving different variants of a convex optimisation problem 
around a sample 3][9]. Sampling has been accelerated by reusing previous model 
checking results et or by just in time compilation of the parameter function [12]. 
These methods are inherently inadequate for finding optimal parameter settings. 
To the best of our knowledge, optimal parameter synthesis has received scant 
attention so far. A notable exception is the analysis (e.g., using SMT techniques) of 
rational functions, typically obtained by some form of state elimination {1oj12]15], 
that symbolically represent reachability probabilities in terms of the parameters. 
These functions are exponential in the number of parameters and become 
infeasible for more than two parameters. Parameter lifting Bloes] remedies this 
by using an abstraction technique, but due to an exponential blow-up of region 
splitting, is limited to a handful of parameters. The challenge is to solve optimal 
parameter synthesis problems with more parameters. 


Approach. We propose to tackle the optimal synthesis problem by a deep inte- 
gration of two seemingly unrelated techniques: monotonicity checking and 
parameter lifting 25]. The former builds a partial order on the state space to 
check whether a pMC is (local or global) monotonic in a certain parameter, while 
the latter is an abstraction technique that “lifts” the parameter dependencies, ob- 
taining interval MCs E21, and solves them in an iterative manner. To construct 
an efficient combination, we extend both methods such that they profit from 
each other. This is done by combining them with a tailored divide-and-conquer 
component, see Fig. |1| To prove bounds on the induced reachability probability, 
parameter lifting has been the undisputed state-of-the-art, despite the increased 
attention that parameter synthesis has received over recent years. This paper 
improves parameter lifting with more advanced reasoning capabilities that involve 
properties of the derivative, rather than the actual probabilities. These reason- 
ing methods enable reducing the exponent of the inherently exponential-time 
procedure. This conceptual advantage is joined with various engineering efforts. 
Parameter lifting is accelerated by using side products of monotonicity analysis 
such as local monotonicity and shrinked parameter regions. Furthermore, bounds 
obtained by parameter lifting are used to obtain a cheap rule accelerating the 
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region region 
Parameter Lifting Divide and Conquer Monotonicity Checking 
Sect. [5] region value Sect. [6] monotone pars. Sect. [4] 


state bounds | 


local monotonicity 


Fig. 1. The symbiosis of parameter lifting and monotonicity checking. Red are new 
interactions, compared to earlier work. Details are given in Sect. 


monotonicity checker. The interplay between the two advanced techniques is 
tricky and requires a careful treatment. 

Note that we are not the first to exploit monotonicity in the context of 
pMCs. Hutschenreiter et al. showed that the complexity of model checking (a 
monotone fragment of) PCTL on monotonic pMC is lower than on general pMCs. 
Pathak et al. provided an efficient greedy approach to repair monotonic 
pMCs. Recently, Gouberman et al. used monotonicity for hitting probabilities 
in perturbed continuous-time MCs. 


Experimental results. We realised the integrated approached on top of the 
Storm model checker. Experiments on several benchmarks show that opti- 
mal synthesis is possible: (1) on benchmarks with up to about a few hundred 
parameters, (2) on benchmarks that cannot be handled without monotonicity, 
(3) while accelerating pure parameter lifting by up to two orders of magnitude. 
Our approach induces a bit of overhead on small instances for some benchmarks, 
and starts to pay off when increasing the number of parameters. 


Main contribution. In summary, the main contribution of this paper is a tight 
integration of parameter lifting and monotonicity checking. Experiments indicate 
that this novel combination substantially improves upon the state-of-the-art in 
optimal parameter synthesis. 


Organisation of the paper. Section [| provides the necessary technical background 
and formalises the problem. Section [3] explains the approach—in particular the 
meaning of the arrows in Fig. Section [4] discusses how to state bounds can 
be exploited in the monotonicity checker. Section [5] details how to exploit local 
monotonicity in parameter lifting. Section [6] then considers the tight interplay via 
the divide-and-conquer method. Section [7|reports on the experimental results of 
our prototypical implementation in Storm while Section [8] concludes the paper. 


2 Problem Statement 


A probability distribution over a finite or countably infinite set X is a function 
u: X > [0, 1] C R with X rex w(x) = 1. The set of all distributions on X is 
denoted by Distr(X). Let @ € R” denote (a1,..., an). The set of multivariate 
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polynomials over ordered variables ¥ = (x1,..., £n) is denoted Q[Z]. For a 
polynomial f and variable x, we write x € f if the variable occurs in the 
polynomial f. An instantiation for a finite set V of real-valued variables is a 
function u: V > R. We often denote u as a vector @ € R” with u; := u(x;) for 
x; E€ V. A polynomial f can be interpreted as a function f: R” — R, where f(u) 
is obtained by substitution, i.e., f[Z% < wu], where each occurrence of x; in f is 
replaced by u(x). 


Definition 1 (pMC). A parametric Markov Chain (pMC) is a tuple M = 
(S,s7,T,V,P) with a finite set S of states, an initial state sz € S, a finite set 
T CS of target states, a finite set V of real-valued variables (parameters) and 
a transition function P: S x S > QV]. 


A pMC M is a (discrete-time) Markov chain (MC) if the transition function 
yields well-defined probability distributions, i.e., P(s,-) € Distr(S) for each s € S. 
Applying an instantiation Ñ to a pMC M yields M(t] by replacing each f € Q[V] 
in M by f(z). An instantiation @ is well-defined (for M) if M[u] is an MC. 
A well-defined instantiation U is graph-preserving (for M) if the topology is 
preserved, i.e., P(s,s’) Æ 0 implies P(s, s’)(%) Æ 0 for all states s and s’. A set 
of instantiations is called a region. A region R is well-defined (graph-preserving) 
if @ is well-defined (graph-preserving) for all t € R. In this paper, we consider 
only graph-preserving regions. 

For a parameter-free MC M, Pria (OT) € [0,1] C R denotes the probability 
that from state s the target T is eventually reached. For a formal definition, we 
refer to, e.g., |4| Ch. 10]. For pMC M, Pr4,(OT) is not a constant, but rather a 
function Prág : V —> [0,1], with Pri? (a) = Pria(OT). The closed-form of 
Pr>?” on a graph-preserving region is a rational function over V, i.e., a fraction 
of two polynomials over V. On a graph-preserving region, the function pr 
is continuously differentiable [25]. We call Prig” the solution function, and for 
conciseness, we often omit the subscript M. Graph-preserving instantiations 
Ü, Ü! preserve zero-one probabilities, i.e., Pr°?T (@) = 0 implies Pr°?T (Ñ) = 0, 
and analogous for =1. We simply write Pr°?T = 0 (or =1). Let X (4) denote all 
states s € S with Pr°?T = 1 (Pr? = 0). By a standard preprocessing |4|, we 


may safely assume a single 2 and 4 state. 


Problem statement. This paper is concerned with the following questions for a 
given pMC M with target states T, and region R: 


Optimal synthesis. Find the instantiation u* such that 


ok 


= lan AOA 
u DEn twa] (OT) 


e-Robustness. Given tolerance £ > 0, find an instantiation w* such that 


Prane < Bea O Bry lT). 
max rua (OT)—€ < Pryga(OT) < ee taj (OT) 
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(a) Mı (b) My, [td], t = {p => 1/3} (c) M2 


Fig. 2. Toy examples for pMCs. 


The optimal synthesis problem is ETR-hard 23], i.e., as hard as finding a 
root of a multivariate polynomial. It is thus NP-hard and in PSPACE. The same 
applies to e-robustness. The value of À can be viewed as the optimal reachability 
probability of T — up to the robustness tolerance € — over all possible parameter 
values while w* is the instantiation that maximises the probability to reach T. 

Like [28], we assume pMCs to be simple, i.e., P(s, 8’) € {z,l-x | x EV}UQ 
for all s,s’ € S and X, P(s, s’) = 1. Theoretically, the above problem for simple 
pMGCs is as hard as for general pMCs, and practically, most pMCs are simple. 
For simple pMCs, the graph-preserving instantiations are in (0,1)!”!. Regions are 
assumed to be well-defined, rectangular and closed, i.e., a region is a Cartesian 
product of closed intervals, R = X_-y[¢x,U«]. Let R(x) denote the interval 
[€x, Ux] and occur(s) the set of variables {x € V | ds’ € S. x € P(s,s’)}. For 
simple pMCs, this set has cardinality at most one. A state s is called parametric, 
if occur(s) # Q; we write occur(s) = x if {x} = occur(s). 


Example 1. Fig. depicts a pMC. A region R is given by p € [1/4, 1/2]. An 
instantiation @ = {p > 1/3} € R yields the pMC in Fig. |2(b)| The solution 
function is Pea = p- (1 — p). Indeed Pr" (u) = 2/9 = Pr m, ja (ÔT). 


3 Main Ingredients in a Nutshell 


To solve the problem statement, we consider an iterative method which analyzes 
regions, and, if necessary, splits these regions. In particular, we combine two 
approaches — parameter lifting and monotonicity checking — as shown in Fig. 


3.1 The Monotonicity Checker 


We consider local and global monotonicity. We start with defining the latter. 


Definition 2 (Global monotonicity). A continuously differentiable function f 
ð 

on region R is monotonic increasing in variable x, denoted ft”, if z1 > 0 for 
T 

all i € H’| The pMC M = (S,s1,T,V,P) is monotonic increasing in parameter 

x EV on graph-preserving region R, written Mt®, if Prot +e, 


3 To be precise, on the interior of the closed set R. 
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q [2/5, 3/4] 
(a) M3 (b) Ms, R s.t. p € [1/3, 1/2], q € [2/s, 3/4] 


Fig. 3. Simple pMC that indeed is an iMC. 


Monotonic decreasing, written MJ, is defined analogously. Let succ(s) = {s’ € 


S | P(s, s") £0} be the set of direct successors of s. Given the recursive equation 
pe = Drsiesicetsy P (618 Pr? >T for state s + £, 4, we have 


R iff KA N. prs >T 7) > 
Mt, i Pa > P(s, s") - Pr (u) > 0, 


s’€succ(s) 


for all t € R. Rather than checking global monotonicity, the monotonicity checker 
determines a subset of the locally monotone state-parameter pairs. Such pairs 
intuitively capture monotonicity of a parameter only locally at a state s. 


Definition 3 (Local monotonicity). Function Pr*~? is locally monotonic 
increasing in parameter x (at state s) on region R, written Prea TIER, if 


Vie R. (—P(s,s")) -Pr >T | (@) > 0. 
Pan og 

Thus, while global monotonicity considers the derivative of the entire solution 
function, local monotonicity (in s) only considers the derivative of the first 
transition (emanating from s). Local monotonicity of parameter xv in every state 
implies global monotonicity of x, as shown in 27. As checking global monotonicity 
is co-ETR hard 27], a practical approach is to check sufficient conditions for 
monotonicity. These conditions are based on constructing a pre-order on the 
states of the pMC; this is explained in detail in Section 


Example 2. For R = {ti(p) € [1/10, 9/10]}, pMC M; in Fig. [2(a)]is locally mono- 
tonic increasing in p at sg and locally monotonic decreasing in p at sı. From 
this, we cannot conclude anything about global monotonicity of p on R. In- 
deed, the pMC is not globally monotonic on R. Mı is globally monotonic on 
R’ = {u(p) € [4/10, 1/2]}, but this cannot be concluded from the statement above. 
Contrarily, the pMC Mg in Fig. [2(c)jis locally monotonic increasing in p at both 
So and sı, and is therefore globally monotonic increasing in p. 


3.2 The Parameter Lifter 


The key idea of parameter lifting is to drop all parameter dependencies— 
parameters that occur at multiple states in a pMC—by introducing fresh param- 
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eters. The outcome is an interval Markov chain [17|[21], which can be considered 
a special case of pMCs in which no parameter occurs at multiple states. 


Definition 4 (Interval MC). A pMC is a (simple) interval MC (iMC), if 
occur(s)M occur(s’) = Q for all states s £ 5’. 


All iMCs in this paper are simple. We typically label transitions emanating from 
state s in an iMC with x = occur(s) by R(x) = Lz, uz]. 


Example 3. The pMC in Fig. is an iMC. For a fixed R, the typical notation 
is given in Fig.|3(b)| For the pMC M; in Fig.|2(a)| the parameter p occurs at 
states so and sı, so that this pMC is not an iMC. 


Definition 5 (Relaxation). The relaxation of simple pMC M=(S,s1,T,V,P) 
is the iMC relax(M) = (S,s1,T,V',P’) with V' = {zs | s € S,occur(s) Æ Ø}, 
P'(s,s’) = P(s, 8’)[occur(s) + zs]. 


For state s with occur(s) = x, let relax(R)(x,) = R(occur(s)). Likewise, an 
instantiation in i € R is mapped to relax(t) by relax(ū) (xs) = u(occur(s)). 


Extremal reachability probabilities on iMCs are reached at the extremal 
values of a region. Formally (25], for each state s and region R in pMC M: 


max Prig  (ū) < _ max Prírxm) (0). (1) 


ueR ~ i€relax(R) 
This result is a direct consequence of local monotonicity at all states implying 
global monotonicity. The extremal values for the reachability probabilities in the 
obtained iMCs are obtained by interpreting the iMCs as MDPs and applying 
off-the-shelf MDP model checking. We denote the right-hand side of as upper 
bound on R, denoted Ur(s). Analogously we define a lower bound Lpg(s). 


Example 4. The pMC Msg in Fig. |3(a)| is the relaxation of the pMC Mı in 
Fig. Indeed, for R = {ti(p) € [1/4, 3/4] }: 


Pre T (a) — 1/4 < 9/16 = Pro? (a). 
Bee ie A ee ag ie 


3.3 Divide and Conquer 


Figure |4| shows how the extremal value for region R,, pMC M, reachability 
property y and precision € can be computed using only parameter lifting [25]: 
This paper extends this iterative approach to include monotonicity checking. The 
main idea is to analyze regions and split them if the result is inconclusive. The 
approach uses a queue of regions that need to be checked and the current extremal 
value CurMax found so far. In particular, we maintain a lower bound on CurMax 
and know a (potentially trivial) upper bound: (CurMax+e) > max geo Ug(s1)- 
We iteratively check regions and improve both bounds until a satisfactory solution 
is found. Initially, the queue only contains R,. For a selected R from the queue 
we compute an upper bound Up with parameter lifting. If Up at the initial state 
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Rizeseg Pir R 
Queue Q 
Split R if Ur (sr) < CurMax, Ø | Parameter Lifting 
else, R Guess uv € R 
update CurMax else, R 


|# CurMaxt+e > maxpegu, rR} Up(sz) 


Result: CurMax 


Fig. 4. Divide and conquer with pure parameter lifting 


is below the current optimum, we can safely discard R. Otherwise, we want 
attempt to improve CurMax by guessing u € R and computing Prig” (a) using 
model checking“ If Prig” (i) exceeds CurMax, we update CurMax. Now, we check 
whether we can terminate: 

In particular, let the maximum so far be bounded by maxgegusp} Ua(sz). If 
the upper bound is below CurMax+, we are done, and return CurMax together 
with the u associated with CurMax. Otherwise, we continue and split R into 
smaller regions. By default, parameter lifting splits R along all dimensions. This 
algorithm converges in the limit [25]. 


Example 5. Reconsider Bx fl and assume we want to show maxizer Prog. (u) < 
1/4, with € = 1/8. We sample in (the middle of) R and obtain CurMax = 1/4, 
while the upper bound Up(s;) from Ex. [4] is 9/16. We split R into two regions 
Ry = {t(p) € [1/4,1/2]} and Rə = {u(p) € [1/2,3/4]}. Parameter lifting reveals 
that for both regions the bound is 3/s. Thus, 1/4 is an epsilon-close instance. 


The remainder of this paper integrates monotonicity checking in this loop. 


This paper addresses three challenges: (Sect. (4): Using state bounds in 
the monotonicity checker. (Sect. B}: Using local monotonicity in parameter 
lifting. (Sect. (6) Integrating monotonicity in the divide and conquer loop. 


4 A New Rule for Sufficient Monotonicity 


As discussed in Section we aim to analyse whether for a given region R, 
parameter x is locally monotonic at state s. The key ingredient is a pre-order 
on the states of the pMC at hand that is used for checking sufficient conditions 
for being local monotonic. We define the pre-order and recap the “cheap” rules 
for efficiently determining the pre-order as adopted from |27|. We add a new, 
simple rule to this repertoire that lets us avoid the computationally “expensive” 


t Using an instantiation checker that reuses model-checking results from the last guess. 
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rules using assumptions from 27]. The information needed to apply this new rule 
readily comes from parameter lifting as we will see. 


Ordering states for local monotonicity. Let us consider a conceptual example 
showing how a pre-order on states can be used for determining local monotonicity. 


Example 6. Consider the pMC Mg in Fig. We reason backwards that both 
states are locally monotone increasing in p. First, observe that 2 has a higher 
probability to reach the target (1) than + (0). Now, in sı, increasing p will move 
more probability mass to L, and hence, it is locally monotone. Furthermore, we 
know that the probability from sı is between 2 and 4. Now, for sọ we can use 
that increasing p moves more probability mass to s1, which we know has a higher 


probability to reach the target than +. 


As in 27], we determine local monotonicity by ordering states according to their 
reachability probability. 


Definition 6 (Reachability order). A relation <r,r C SxS is a reachability 
order with respect to T C S and region R if for all s,t € S: 


s<rrt implies (vii ER. Pro? (zz) < Pri?" (u)). 
The order <p,7 is called exhaustive if the reverse implication also holds. 


The relation <p 7 is a reflexive (aka: non-strict) pre-order. The exhaustive 
reachability order is the union of all reachability orders, and always exists. Unless 
stated differently, let < denote the exhaustive reachability order. If the successor 
states of a state s are ordered, we can conclude local monotonicity in s: 


Lemma 1. Let s, 51,52 € S with P(s, 51) =x and P(s, s2) = 1—a. Then: 
for each region R: S2 Sr T 51 implies Pri 
This result suggests to look for a so-called “sufficient” reachability order: 


Definition 7 (Sufficient reachability order). A reachability order < is 
sufficient for parameter x if for all states s with occur(s) = {x} and 51,52 € 
succ(s) it holds: (s1 < s2 V s2 < s1). 


Phrased differently, the reachability order < is sufficient for x € V if (succ(s), <) 


is a total order for all s that have transitions labelled with x. Observe that in 
contrast to an exhaustive order, a sufficient order does not need to exist. 


Ordering states efficiently. Def. [6] provides a conceptually simple scheme to order 
states sı and s2: compute the rational functions Pr^!?T and Pr®??T, and compare 
them. As the size of these multivariate rational functions can be exponential in 
the number of parameters (16), this is not practically viable. To avoid this, 
has identified a set of rules that provide sufficient criteria to order states. Some 
of these rules are conceptually based on the underlying graph of a pMC and are 
computationally cheap; other rules reason about (a partial representation of) the 
full rational function Pr*!~7 and are computationally expensive. 
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(a) Ma 


Fig. 5. Non-trivial pMCs for deducing monotonicity. 


Example 7. Using bounds avoids expensive rules: See My, in Fig. |5(a)| Let 
R= {ū(q) € [!/2, 3/4], ü(p) € [2/2, 2/3]}. Using the solution functions p? + (1—p) - q 
and q- (1—q) for sı and s2 yields s2 < sı on R. Such a rule is expensive, but the 
cheaper graph-based rules analogous to Ex. [6] are not applicable. However, when 
we use bounds from parameter lifting, we obtain Up(s2) = 3/s and Lr(s;) = 1/2, 
we observe Up(s2) < Lr(s1) and thus s2 < sı on R. Bounds also just simplify 
graph-based reasoning, in particular in the presence of cycles. Consider Ms: As 
Lr(s3) > Ur(sa), with reasoning similar to Ex. [6] it follows that s2 < sı, and 
we immediately get results about monotonicity. 


Our aim is to avoid applying the expensive rules from by imposing a new — 
and thanks to parameter lifting — cheap rule. To obtain this rule, we assume for 
state s and region R to have bounds Lg(s) and Ug(s) at our disposal satisfying 


Lr(s) < Pro?" (Ñ) < Upn(s) forall@eR. 


Such bounds can be trivially assumed to be 0 and 1 respectively, but the idea is 
to obtain tighter bounds by exploiting the parameter lifter. This will be further 
detailed in Section [5] A simple observation on these bounds yields a cheap rule 
(provided these bounds can be easily obtained). 


Lemma 2. For sı,s2 E€ S and region R: Lr(s1) > Ur(s2) implies s2 SRT 81. 


In the remainder of this section, we elaborate some technical details. 


Algorithmic reasoning. The pre-order < is stored by a representation of its Hasse 
diagram, referred to as RO-graph. Evaluating whether two states are ordered 
amounts to a graph search in the RO-graph. We start off with the initial order 
Ax 2. Then we attempt to apply one of the cheap rules to a state s. Lemma [] 
provides us with more potential to apply a cheap rule. The typical approach 
is to do this in a reverse topological order over the RO-graph, such that the 
successors of s are already ordered as much as possible. If the successor states 
of s are ordered, then s can be added as a vertex and directed edges can be 
added between s and its successors. Otherwise, state s is added between ^ and L. 
This often allows for reasoning analogous to the example. To deal with strongly 
connected components, rules exist that add states to the order even when not 


Finding Provably Optimal Markov Chains 183 


all successors are in the graph. If no cheap rule can be applied, more expensive 
rules using the rational functions from above or SMT-solvers are applied} 


5 Parameter Lifting with Monotonicity Information 


Recall that our aim is to compute some À > maxizer Prig T (Ñ) — e for some fixed 
region R. In order to do so, we compute i= MAX ZErelax(R) Pria m) (U) on the 
iMC relax(M) obtained by relaxing the pMC M. We discuss how to speed up 
this computation using local monotonicity information. In the remainder, let D 
denote relax(M) and J denote relax( R). As we consider simple iMCs, let state s 
with P(s, s1) = zs and P(s, s2) = 1—a, where the parameter zs does not occur 
on other transitions. Assume the lower (upper) bound on zs is ls (us). 


Analyzing (simple) iMCs. An iMC induces a maximal reachability bound by 
substituting every «, with either ls or us. Formally, let V(I) denote the corner 
points of the interval J. Then, 


max Pr3,*"(@) = max Pr 
D 


TEI īev(I) 
Thus, to maximise the probability to reach T, in every state s either the lower or 
the upper bound of parameter 2, has to be chosen. This induces O(2!5!) choices. 
They can be efficiently navigated by interpreting these choices as nondeterministic 
choices, interpreting the iMC as a Markov decision process (MDP) [25]. 


Local monotonicity helps. Assume local monotonicity establishes sı < s2, i.e., the 
reachability probability from s2 is at least as high as from sı. To maximise the 
reachability probability from s, the parameter x, should be minimised. Contrary, 
if sg < sı, parameter x, should be maximised. Thus, every local monotonicity 
result halves the amount of vertices that we are maximising over. 


Example 8. Consider the iMC Ms in Fig. B(a)} which is the relaxation of the pMC 
My, in Fig.|2(a)| There are four combinations of lower and upper bounds that 
need to be investigated to compute the upper bound. Using local monotonicity, 
we deduce that q should be as low as possible and p as high as possible. Rather 
than evaluating a MDP, we thus obtain the same upper bound on the reachability 
probability in Mı by evaluating a single parameter-free Markov chain. 


Accelerating value iteration. Parameter lifting [25| creates a single MDP — a 
comparatively expensive operation — and instantiates this MDP based on the 
region R to be checked. For computing the bound bY specifically, it uses value 
iteration. Roughly, this means that for each state we start with either its lower 
or upper bound. The instantiated MC is then checked. Then, all bounds that can 


5 In an attempt to reduce the cost of these rules, the algorithm allows for deferring 
proof obligations in the form of assumptions. This is detailed in 27]. For this paper, 
however, the only relevant aspect is that these rules are computationally expensive. 
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By, bwU RO-graph 
Rn, Lar, Up Queue Q (1) R, Lr, UR (2) Monotonicity Checking Fy Bay Ae 
a (5) Fo Sect. [4] 
erate 23 (s;) ; (3) Shrink 
(8) Split R’ TSR Cerna, g “local mon. 
aie 7 
Ul j 
eleeri ad atay (6) Guess @ € R’ else, Ri Das Uni (4) Parameter Lifting 
update CurMax Sect. R 


Jo if CurMax + € > MaX REQU{R} Ug(sr) 


Result: CurMax 


Fig. 6. The symbiosis of monotonicity checking and parameter lifting. Red are new 
elements compared to the vanilla approach in Fig. 


be improved by switching from lower to upper bound or vice versa are swapped. 
This procedure terminates with the optimal assignment to all bounds. We exploit 
the local monotonicity in this value iteration procedure by fixing the chosen 
bounds at locally monotonic states. 


6 Lifting and Monotonocity, Together 


In this section, we give a more detailed account of our approach, i.e., we will zoom 
in into Fig. [I]resulting in Fig. [6] In particular, we detail the divide-and-conquer 
block. This loop is a refinement (indicated in red in Fig. (6) of Fig. |4| We first 
give an overview, before discussing some aspects in greater detail. 


Overall algorithm The approach considers extended regions, i.e., a region R 
is equipped with state bounds Lr(s) and Up(s) such that Lr(s) < Prig” (a) < 
Up(s) for every state s, and with monotonicity information about the monotonic 
increasing (and decreasing) parameters on R. Initially the input region R is 
extended with Lr(s) = 0,Ur(s) = 1 for every s, and empty monotonicity 
information. Additionally, we initialize a conservative approximation for the 
maximum probability CurMax so far as 0. Extended regions are stored in the 
priority queue Q where Up(s,) are used as priority. We discuss details below. Once 
initialised, we start an iterative process to update the conservative approximation 
of Lr and UR. 

First, (1) a region R and the associated reachability order stored as RO-graph 
is taken from the queue Q and (2) its monotonicity is computed while using the 
annotated bounds Lr and Up. Let x? denote globally monotonic increasing 
parameters on R, and similarly, X 7 denote decreasing parameters on R. For 
brevity, we omit the superscript R in the following. 

As a next step, we (3) shrink a region based on global monotonicity. We 
define the region Shrinkx,,x, (R) as follows: Shrinkx, x, (R) (£) = 4z if £ € X}, 
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Shrink(R)(z) = uz if x € Xy, and Shrink(R)(x) = R(x) otherwise. In the 
remainder of this section, let R’ denote Shrinkx,,x, (R). Observe that we can 
safely discard instantiations in R\ R’, as maxger Prog” (Ñ) = maxzer Prig” (a). 

Next, we (4) analyse the region R’ to get bounds Lr, UR using parameter 
lifting and using the local monotonicity information from the monotonicity 
check. We make two observations: First, it holds that Dr(s) < Dr(s) and 
Ur:(s) < Urls) for every s: Thus, there is no regret in analysing R’ rather than 
R. Also, consider that if all parameters are globally monotone, the region R’ is a 
singleton and straightforward to analyse. 

If (5) Ur (sr) < CurMax, then we discard R’ altogether and go to (1). Other- 
wise, we (6) guess a candidate ŭ € R’, and set CurMax to max(CurMax, Prig” (i). 
If (7) CurMax +£ > maxpegusp UR(sr), then we have solved our problem 
statement by returning CurMax. Otherwise, we cannot yet give a conclusive 
answer, and need to refine our analysis. To that end, we (8) split the region R’ 
into smaller (rectangular) regions R),..., Rn. Note that these sub-regions first 
inherit the bounds of the region R’; their bounds are refined in a subsequent 
iteration (if any). Termination in the limit (i.e., convergence of the lower and 
upper bound to the limit) follows from the termination of monotonicity checking 
and the termination of the loop in Fig. 


Incrementality A key aspect in tuning iterative approaches is the concept of 
incrementality; i.e., reusing previously computed information in later computation 
steps. Parameter lifting is already incremental by reusing the MDP structure 
in an efficient manner. Let us address incrementality for the monotonicity checker. 
Notice that all monotonicity information and all bounds that are computed for 
region R carry over to any R C R. In particular, s <r,r s implies s Xp, 7 s 

Furthermore, our monotonicity checker may give up in an iteration if no 
cheap rules to determine monotonicity can be applied. In that case, we annotate 
the current reachability order such that after refining bounds, in a subsequent 
iteration, we can quickly check where we gave up in a last iteration, and whether 
refined bounds allow progress in constructing the reachability order. Notice that 
in principle, we have to duplicate the order for each region. However, we do this 
only until the monotonicity checker does not stabilize. The checker stabilizes, 
e.g., if an order is sufficient. Once the checker stabilized, we do not duplicate the 
order anymore (as no more local or global monotonicity can be deduced). 


Heuristics Our approach allows for several choices in the implementation. 
Whereas the correctness of the approach does not depend on how to resolve these 
choices, they have a significant influence on the performance. We discuss (what 
we believe to be) the most important choices, and how we resolved these choices 
in the current implementation. 


Initialising CurMaz. Previously Storm was applicable only to few parameters and 
generously initialized CurMax by sampling all vertices V(R), which is exponential 
in the number of parameters. To scale to more parameters, we discard this 
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sampling. Instead, we sample for each parameter independently to find out which 
parameters are definitely not monotone. Naturally, we skip parameters already 
known to be monotone. We select sample points as follows. We distribute the 
50 points evenly along the dimension of the parameter. All other parameter 
values are fixed: Non-monotonic parameters are set to their middle point in their 
interval (as described by the region). Monotone parameters are set at the upper 
(lower) bound when possibly monotone increasing (decreasing). 


Updating CurMaz. To prove that CurMax is close to the maximum, it is essential to 
find a large value for CurMax fast. In our experience, sampling at too many places 
within regions yields significant overhead, but taking L(s,) is a too pessimistic 
way to update CurMax. To update CurMax, we select a single ŭ € R’ in the middle 
of region R’. As we may have shrunk the region R, the middle of R’ does not 
need to coincide with the middle of R, which yields behavior different from the 
vanilla refinement loop. 


How and where to split? There are two important splitting decisions to be made. 
First, we need to select the dimensions (aka: parameters) in which we split. 
Second, we need to decide where to split along these dimensions. We had little 
success with trivial attempts to split at better places, so the least informative 
split in the middle remains our choice for splitting. However, we have changed 
where (in which parameter or dimensions) to split. Naturally, we do not (need 
to) split in monotonic parameters. Previously, parameter lifting split in every 
dimension at once. Let us illustrate that this quickly becomes infeasible: Assume 
10 parameters. Splitting the initial region once yields 1024 regions. Splitting half 
of them again yields > 500,000 regions. Instead, we use region estimates, which 
are heuristic values for every parameter, based on the implementation of {19]. 
These estimates, provided by the parameter lifter, essentially consider how well 
the policy on the MDP (selecting upper or lower bounds in the iMC) agrees with 
the dependencies induced by a parameter: The more it agrees, the lower the 
value. The key idea is that one obtains tighter bounds if the policy adheres to 
the dependencies induced by the parameterd>| We split in the dimension with 
the largest estimate. If the region estimate is smaller than 1074, then we split in 
the dimension of R with the widest interval. 


Priorities in the region queue. Contrary to [25], we want to find the extremal value 
within the complete region, rather than partitioning the state space. Consequently, 
the standard technique splits based on the size of the region, and de-facto, a 
breadth-first search. When we split a region, we prioritize the subregions RCR 
with Ug (sr), as Ug(sr) < Ur (sr). We use the age of a region to break ties. 
Here, a wild range of exploration strategies is possible. To avoid overfitting, we 
refrain in the experiments from weighting different aspects of the region, but the 
current choice is likely not the final answer. 


6 Technically, the value is computed as the sum of the differences between the local 
lower and upper bound on the reachability probability over all states with this 
parameter. 
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Table 1. Overview of the experimental results comparing vanilla parameter lifting to 
the integrated approach 


e: 0.1 e: 0.05 

integrated vanilla integrated vanilla 
name instance|#states #trans |V] # i #ib t yi "EEEn t yi t 
NRP (5,1) 56 75 5 469 2 <1 2575 <1 5143 2  <1/48701 3 
(10,1) 86 250 10 66219 2 11) 512909 85]/7168029 2 1594 TO 
(12,1) 259 348 12|| 425643 2 98)3304325 757 TO TO 
(13,1) 300 403 13||1103811 2 299 TO TO TO 
(14,1) 344 462 14)/2608869 2 718 MO MO MO 
(15,1) 391 525 15 TO MO MO MO 
EVADE (1,2,0,1) 29 249 40 0 2 <1 2410 2 0 2 <1} 4619 4 
(1,2,3,1)| 513 993 160 0 2 8 MO 0 2 3 MO 
(1,2,0,2) 425 842 141 0 2 2 MO 0 2 2 MO 
(1,2,3,2) 1697 3362 561 0 2 21 MO 0 2 22 MO 
Herman (11,10) | 21500 242926 1 3 3 3 J3 2 9 3 14 9 3 
(11,15) 31740 369706 1 5 3 14 5 3 11 3 25 11 5 
(13,15) | 126888 1713246 iL 7 5 44 7 18 11 6 440 11 24 
(13,25) | 208808 2889206 1 7 5 91 7 31 11 6 1415 11 41 
(13,35) | 290728 4065166 1 5 4 128 5 85 TO 11 54 
Maze (25) 360 660 24 0 2 <i 1. 1 0 2 <1 40 <1 
(1000) 14985 26985 999 0 2° 1 <1 0 2 1 MO 
(10000) | 149985 269985 9999 0 2 166 1 <1 0 2 182 TO 


Obtaining bounds for the monotonicity checker. While the baseline loop only 
computes upper-bounds, we use lower bounds to boost the monotonicity checking. 
We currently run these bounds until the monotonicity checker has stabilized. We 
observe that, mostly due to numerical computations, the time that the lower 
bounds take can be significant, but the overhead and the merits of getting larger 
lower bounds are hard to forecast. 


7 Empirical Evaluation 


Setup. We investigate the performance of the extended divide-and-conquer 
approach presented in Fig. a We have implemented the algorithm explained 
above in the probabilistic model checker Storm |11|. We compare its performance 
with vanilla parameter lifting, outlined in Fig. mA baseline. Both versions use 
the same underlying data structures and version of Storm. All experiments were 
executed on a single core Intel Xeon Platinum 8160 CPU. We did neither use 
any parallel processing nor randomization. We used a time out of 1800s and a 
memory limit of 32GB. We exclude model-building times from all experiments 
and emphasize that they coincide for the vanilla and new implementations. 


Benchmarks and results. The common benchmarks Crowds, BRP, and Zeroconf 
have only globally monotonic parameters (and only two). Using monotonicity, 
they become trivial. The structure of NAND and Consensus makes them not 
amenable to monotonicity checking, and the performance mostly resembles the 
baseline. We selected additional benchmarks from QI, 23], and [18], see below. 
The models from the latter two sources are originally formulated as partially 
observable MDPs and were translated into pMC using the approach in [19]. 
Table [1] summarizes the results for benchmarks identified by their name and 
instance. We list the number of states, transitions and parameters of the pMC. 
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For each benchmark, we consider two values for e: e=0.05 and e=0.1. For each 
£, we consider the time t required and the number (i) of iterations that the 
integrated loop and the baseline require. For the integrated loop, we additionally 
provide the number (i,) of extra (lower bound) parameter lifting invocations 
needed to assist the monotonicity checker. 


Discussion of the results. We make the following observations. 


— NRP: this model is globally monotonic in all its parameters. Our monotonicity 
checker can find this one parameter. The integrated approach is an order of 
magnitude faster on all instances, scaling to more parameters. 

— Evade: this model is globally monotonic in some of its parameters. Our 
monotonicity check can find this monotonicity for a subset. The integrated 
approach is faster on all instances, as a better initial CurMax is guessed based 
on the results from the monotonicity checker. 

— Herman’s protocol: this is a less favourable benchmark for the integrated 
approach as only one parameter is not globally monotonic. The calculation 
of the bounds for the monotonicity checking yields significant overhead. 

— Maze: this model is globally monotonic in all its parameters. This can be 
found directly by the monotonicity checker, so we are left to check a single 
valuation. This valuation is also provably the optimal valuation. 


In general, for e=0.1, the number of regions that need to be considered is relatively 
small and guessing an (almost) optimal value is not that important. This means 
that the results are less volatile to changes in the heuristic. For e=0.05, it is 
significantly trickier to get this right. Monotonicity helps us in guessing a good 
initial point. Furthermore, it tells us in which parameters we should and should 
not split. Therefore, we prevent unnecessary splitting in some of the parameters. 


8 Conclusion and Future Work 


This paper has presented a new technique for tackling the optimal synthesis 
problem: what is the instance of a parametric Markov chain that satisfies a 
reachability objective in an optimal manner? The key concept is a deep interplay 
between parameter lifting, the favourable technique so far for this problem, and 
monotonicity checking. Experiments showed encouraging results: speed ups of up 
to two orders of magnitude for various benchmarks, and an increased number 
of parameters. Future work consists including advanced sampling techniques 
and applying this approach to other application areas such as optimal synthesis 
and monotonicity in probabilistic graphical models and hyper-properties in 


security |1|. 
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Abstract. This paper presents a novel method for the automated syn- 
thesis of probabilistic programs. The starting point is a program sketch 
representing a finite family of finite-state Markov chains with related but 
distinct topologies, and a reachability specification. The method builds on 
a novel inductive oracle that greedily generates counter-examples (CEs) 
for violating programs and uses them to prune the family. These CEs 
leverage the semantics of the family in the form of bounds on its best- 
and worst-case behaviour provided by a deductive oracle using an MDP 
abstraction. The method further monitors the performance of the synthe- 
sis and adaptively switches between inductive and deductive reasoning. 
Our experiments demonstrate that the novel CE construction provides 
a significantly faster and more effective pruning strategy leading to an 
accelerated synthesis process on a wide range of benchmarks. For challeng- 
ing problems, such as the synthesis of decentralized partially-observable 
controllers, we reduce the run-time from a day to minutes. 


1 Introduction 


Background and motivation. Controller synthesis for Markov decision processes 
(MDPs [35]) and temporal logic constraints is a well-understood and tractable 
problem, with a plethora of mature tools providing efficient solving capabilities. 
However, the applicability of these controllers to a variety of systems is limited: 
Systems may be decentralized, controllers may not be able to observe the complete 
system state, cost constraints may apply, and so forth. Adequate operational 
models for these systems exist in the form of decentralized partially-observable 
MDPs (DEC-POMDPs [33]). The controller synthesis problem for these models 
is undecidable [30], and tool support (for verification tasks) is scarce. 

This paper takes a different approach: the controller together with the en- 
vironment can be modelled as probabilistic program sketches where “holes” in 
the probabilistic program model choices that the controller may make. Concep- 
tually, the controllers of the DEC-POMDP are described by a user-defined finite 


* This work has been partially supported by the Czech Science Foundation grant 
GJ20-02328Y and the ERC AdG Grant 787914 FRAPPANT, the NSF grants 1545126 
(VeHICaL) and 1646208, by the DARPA Assured Autonomy program, by Berkeley 
Deep Drive, and by Toyota under the iCyPhy center. 

© The Author(s) 2021 


J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 191-209, 2021. 
https: //doi.org/10.1007/978-3-030-72016-2_11 


192 R. Andriushchenko et al. 


family M of Markov chains. The synthesis problem that we consider is to find 
a Markov chain M (1.e., a probabilistic program) in the family M, such that 
M E ọ, where ¢ is the specification. To allow efficient algorithms, the family must 
have some structure. In particular, in our setting, the family is parameterized 
by a set of discrete parameters K; an assignment K — V of these parameters 
with concrete values V from its associated domain yields a family member, i.e., 
a Markov chain (MC). Such a parameterization is naturally obtained from the 
probabilistic program sketch, where some constants (or program parts) can be 
left open. The search for a family member can thus be considered as the search 
for a hole-assignment. This approach fits within the realm of syntax-guided 
synthesis [2]. 


Motivating example. Herman’s protocol [24] is a well-studied randomized dis- 
tributed algorithm aimed to obtain fast stabilization on average. In [26], a 
family M of MCs is used to model different protocol instances. They considered 
each instance separately, and found which of the controllers for Herman’s protocol 
performs best. Let us consider the protocol in a bit more detail: It considers 
self-stabilization of a unidirectional ring of network stations where all stations 
have to behave similarly—an anonymous network. Each station stores a single bit, 
and can read the internal bit of one (say left) neighbour. To achieve stabilization, 
a station for which the two legible bits coincide updates its own bit based on 
the outcome of a coin flip. The challenge is to select a controller that flips this 
coin with an optimal bias, i.e., minimizing the expected time until stabilization. 
In a setting where the probabilities range over 0.1,0.2,...,0.9, this results in 
analyzing nine different MCs. Does the expected time until stabilization reduce 
if the controllers are additionally allowed to have a single bit of memory? In 
every step, there are 9-9 combinations for selecting the coin flip and for each 
memory cell and coin flip outcome, the memory can now be updated, yielding 
2-2-2 possibilities. This one-bit extension thus results in a family of 648 models. 
If, in addition, one allows stations to make decisions depending on the token-bits, 
both the coin flips and the memory updates are multiplied by a factor 4, yielding 
10,368 models. Eventually, analyzing all individual MCs is infeasible. 


Oracle-gquided synthesis. To tackle the synthesis problem, we introduce an oracle- 
guided inductive synthesis approach [25,39]. A learner selects a family member and 
passes it to the oracle. The oracle answers whether the family member satisfies y, 
and crucially, gives additional information in case this is not the case. Inspired 
by [9], if the family member violates the specification vy, our oracle returns a set 
K’ of parameters such that all family members obtained by changing only the 
values assigned to K’ violate p. We argue that such an oracle must (1) induce 
little overhead in providing K’, (2) be aware of the existence of parameters in 
the family, and (3) have (resemblance of) awareness about the semantics of the 
parameters and their values. 


Oracles. With these requirements in mind, we construct a counterexample (CE)- 
based oracle from scratch. We do so by carefully exploiting existing methods. 
We construct critical subsystems as CEs [1]. Critical subsystems are parts of 
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the MC that suffice to refute the specification. If a hole is absent in a CE, 
its value is irrelevant. To avoid the cost of finding optimal CEs—an NP-hard 
problem [19]—we consider greedy CEs that are similar to [9]. However, our greedy 
CEs are aware of the parameters, and try to limit the occurrence of parameters 
in the CE. Finally, to provide awareness of the semantics of parameter values, 
we provide lower and upper bounds on all states: Their difference indicates how 
much varying the value at a hole may change the overall reachability probability. 
These bounds are efficiently computed by another oracle. This oracle analyses a 
quotient MDP obtained by employing an abstraction method that is part of the 
abstraction-refinement loop in [10]. 


A hybrid variant. The two oracles are significantly different. Abstraction refine- 
ment is deductive: it argues about single family members by considering (an 
aggregation of) all family members. The critical subsystem oracle is inductive: 
by examining a single family member, it infers statements about other family 
members. This suggests a middle ground: a hybrid strategy monitors the per- 
formance of the two oracles during the synthesis and suggests their best usage. 
More precisely, the hybrid strategy integrates the counterexample-based oracle 
into the abstraction-refinement loop. 


Major results. We present a novel and dedicated oracle deployed in an efficacious 
synthesis loop. We use model-checking results on an abstraction to tailor smaller 
CEs. Our greedy and family-aware CE construction is substantially faster than 
the use of optimal CEs. Together, these two improvements yield CEs that are on 
par with optimal CEs, but are found much faster. The integration of multiple 
abstraction-refinement steps yields a superior performance:x We compare our 
performance with the abstraction-refinement loop from [10] using benchmarks 
from [10]. Benchmarks can be classified along two dimensions: (A) Benchmarks 
with a structure good for CE-generation. (B) Benchmarks with a structure good 
for abstraction-refinement. A-benchmarks are a natural strength of our novel 
oracle. Our simple, efficient hybrid strategy significantly outperforms the state-of- 
the-art on A-benchmarks, while it only yields limited overhead for B-benchmarks. 
Most importantly, the novel hybrid strategy can solve benchmarks that are 
out of reach for pure abstraction-refinement or pure CE-based reasoning. In 
particular, our hybrid method is able to synthesize the optimal Herman protocol 
with memory—the synthesis time on a design space with 3.1 millions of candidate 
programs reduces from a day to minutes. 


Related work The synthesis problems for parametric probabilistic systems can 
be divided into the following two categories. 


Topology synthesis, akin to the problem considered in this paper, assumes a finite 
set of parameters affecting the MC topology. Finding an instantiation satisfying 
a reachability property is NP-complete in the number of parameters [12], and 
can naively be solved by analyzing all individual family members. An alternative 
is to model the MC family by an MDP and resort to standard MDP model- 
checking algorithms. Tools such as ProFeat [13] or QFLan [40] take this approach 
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to quantitatively analyze alternative designs of software product lines [21,28]. 
These methods are limited to small families. This motivated (1) abstraction- 
refinement over the MDP representation [10], and (2) countererample-guided 
inductive synthesis (CEGIS) for MCs [9], mentioned earlier. The alternative 
problem of sketching for probabilistic programs that fit given data is studied, 
e.g., in [32,38]. 


Parameter synthesis considers models with uncertain parameters associated to 
transition probabilities, and analyses how the system behaviour depends on 
the parameter values. The most promising techniques are based on parameter 
lifting that treats identical parameters in different transitions independently [8,36] 
and has been implemented in the state-of-the-art probabilistic model checkers 
Storm [18] and PRISM [27]. An alternative approach based on building rational 
functions for the satisfaction probability has been proposed in [15] and further 
improved in [22,17,4]. This approach has been also applied to different problems 
such as model repair [5,34,11]. 

Both synthesis problems can be also attacked by search-based techniques that 
do not ensure an exhaustive exploration of the parameter space. These include 
evolutionary techniques [23,31] and genetic algorithms [20]. Combinations with 
parameter synthesis have been used [7] to synthesize robust systems. 


2 Problem Statement 


We formalize the essential ingredients and the problem statement. See [3] for 
more material. 


Sets of Markov chains. A (discrete) distribution over a finite set X is a function 
u: S —> [0,1] s.t. X3, a(x) = 1. The set Distr(X) contains all distributions over 
X. The support of u € Distr(X) is supp(u) = {x € X | u(x) > 0}. 


Definition 1 (MC). A Markov chain (MC) is a tuple D = (S, so, P), where 
S is a finite set of states, so E S is an initial state, and P: S > Distr(S) is 
a transition probability function. We write P(s,t) to denote P(s)(t). The state s 
is absorbing if P(s,s)= 1. 


Let K denote a finite set of discrete parameters with finite domain Vg. For 
brevity, we often assume that all domains are the same, and omit the subscript 
k. A realization r maps parameters to values in their domain, i.e., r: K > V. 
Let RP denote the set of all realizations of a set D of MCs. A K-parameterized 
set of MCs D(K) contains the MCs D,, for every r € RP. In Sect. 3, we give an 
operational model for such sets. In particular, realizations will fix the targets of 
transitions. In our experiments, we describe these sets using the PRISM modelling 
language where parameters are described by undefined integer values. 


Properties and specifications. For simplicity, we consider (unbounded) reach- 
ability properties!. For a set T C S of target states, let P[D,s H T] denote 


1 Our implementation also supports expected reachability rewards. 
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the probability in MC D to eventually reach some state in T when starting 
in the state s € S. A property y = PyalOT] with A € [0,1] and me {<,>} 
expresses that the probability to reach T does relate to A according to <. If 
b= <, then y is a safety property; otherwise, it is a liveness property. Formally, 
state s in MC D satisfies y if P[D, s H} OT] > A. The MC D satisfies ọ if the 
above holds for its initial state. A specification is a set of properties ® = {y; hier, 
and DE @ifVie I: DE gi. 


Problem statement. The key problem statement in this paper is feasibility: 


Given a parameterized set of Markov chains D(K) over parameters K and 
a specification ®, find a realization r: K —> V such that D, = @. 


When D is clear from the context, we often write r = ® to denote D, | ©. 

We additionally consider the optimizing variant of the synthesis problem. 
The maximal synthesis problem asks: given a maximizing property Ymax = 
Pa [OT], identify r* € arg max er? {P[D, = OT] | D, H P} provided it exists. 
The minimal synthesis problem is defined analogously. 

As the state space S, the set K of parameters, and their domains are all finite, 
the above synthesis problems are decidable. One possible solution, called the 
one-by-one approach [14], considers each realization r € RP. The state-space and 
parameter-space explosion renders this approach unusable for large problems, 
necessitating the usage of advanced techniques that exploit the family structure. 


3 Counterexample-Guided Inductive Synthesis 


In this section, we recap a baseline for a counterexample-guided inductive syn- 
thesis (CEGIS) loop, as put forward in [9]. In particular, we first instantiate an 
oracle-guided synthesis method, discuss an operational model for families, giving 
structure to the parameterized set of Markov chains, and finally detail the usage 
of CEs to create an oracle. 

Consider Fig. 1. A learner takes a 


set R of realizations, and has to find a R gs 
realization D, satisfying the specifica- | reR 

tion ®. The learner maintains (a sym- Ct Onk 
bolic representation of) a set Q C R ~ 

of realizations that need to be checked. | rER' CR, I 

It iteratively asks the oracle whether nore F violate ig 


a particular r € Q is a solution. If it is 

a solution, the oracle reports success. Fig. 1. Oracle-guided synthesis 
Otherwise, the oracle returns a set R’ containing r and potentially more realiza- 
tions all violating #. The learner then prunes R’ from Q. In Section 4, we focus 
on creating an efficient oracle that computes a set R’ (with r € R’) of realizations 
that are all violating ®. In Section 5, we provide a more advanced framework 
that extends this method. The remainder of this section lays the groundwork for 
these sections. 
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Families of Markov chains To avoid the need to iterate over all realizations, 
an efficient oracle exploits some structure of the family. In this paper, we focus on 
sets of Markov chains having different topologies. We explain our concepts using 
the operational model of families given in [10]. Our implementation supports 
(more expressive) PRISM programs with undefined integer constants. 


Definition 2 (Family of MCs). A family of MCs is a tuple D = (S, so, K, B) 
with S and so as before, K is a finite set of parameters with domains Vz C S for 
each k € K, and B: S — Distr(K) is a family of transition probability functions. 


Function 6 of a family D of MCs maps each state to a distribution over parame- 
ters K. In the context of the synthesis of probabilistic models, these parameters 
represent unknown options or features of a system under design. Realizations are 
now defined as follows. 


Definition 3 (Realization). A realization of a family D = (S, so, K, B) of MCs 
is a function r : K + S s.t. r(k) © Vk, for allk € K. We say that realization r 
induces MC D, = (S, 80,B,) iff B-(s, 8") = X rex, ra= B(8)(K) for any pair of 
states s,s’ € S. The set of all realizations of D is denoted as RP. 


The set R? = [ [peg Ve of all possible realizations is exponential in |K]. 


Counterexample-guided oracles We first consider the feasibility synthesis for 
a single-property specification and later, cf. Remark 1, generalize this to multiple 
properties and to optimal synthesis. The notion of counterexamples is at the 
heart of the oracle from [9] and Sect. 4. 

If an MC D 9, a countererample (CE) based on a critical subsystem can 
serve as diagnostic information about the source of the failure. We consider the 
following CE, motivated by the notion of critical subsystem in [37]. 


Definition 4 (Counterexample). Let D = (S, so, P) be an MC with 5, g S. 
The sub-MC of D induced by C C S is the MC DLC = (SU {s1}, so, P’), where 
the transition probability function P’ is defined by: 


Ps) = P(s) ifs EC, 
an [sı = 1] otherwise. 


The set C and the sub-MC D}C are called a counterexample (CE) for the property 
P< [OT] on MC D, if DLC A P<, lO(TN (CU {so}))]. 


Let D, be an MC violating the specification y. To compute other realizations 
violating y, the oracle computes a critical subsystem D,.|C’, which is then used 
to deduce a so-called conflict for D, and yp. 


Definition 5 (Conflict). For family of MCs D = (S, so, K, B) and C C S, the 
set Ko of relevant parameters (called conflict) is given by U cc supp(B(s)). 
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Fig. 2. Counterexamples for smaller conflicts. 


It is straightforward to compute a set of violating realizations from a conflict. A 
generalization of realization r induced by the set Ko C K of relevant parameters 
is the set r}Ko = {r' E€ R | Vk € Ko: r(k) = r'(k)}. We often use the term 
conflict to refer to its generalization. The size of a conflict, i.e., the number 
|Ko| of relevant parameters Ko is crucial. Small conflicts potentially lead to 
generalizing r to larger subfamilies rfc. It is thus important that the CEs 
contain as few parameterized transitions as possible. The size of a CE in terms 
of the number of states is not of interest. Furthermore, the overhead of providing 
CEs should be bounded from below by the payoff: Finding a large generalization 
may take some time, but small generalizations should be returned quickly. The 
CE-based oracle in [9] uses an off-the-shelf CE procedure [16,41], and mostly 
does not provide small CEs. 


4 A Smart Oracle with Counterexamples and Abstraction 


This section develops an oracle based on CEs, tailored for the use in an oracle- 
guided inductive synthesis loop described in Sect. 3. Its main features are: 
— a fast greedy approach to compute CEs that provide small conflicts: We 
achieve this by taking into account the position of the parameters. 
— awareness about the semantics of parameters by using model-checking results 
from an abstraction of the family. 
Before going into details, we provide some illustrative examples. 


A motivating example First, we illustrate what it means to take CEs that 
lead to small conflicts. Consider Fig. 2, with a family member D, (left), where 
the superscript of a state identifier s; denotes parameters relevant to s;. Consider 
the safety property p = P<o.4[O{t}]. Clearly, D, J p, and we can construct 
two CEs: C1 = {so, 83,t} (center) and C2 = {80, 1, 52,t} (right) with conflicts 
Ko, = {X,Y} and Ko, = {X}, respectively. It illustrates that a smaller CE 
does not necessarily induce a smaller conflict. 

We now illustrate awareness of the semantics of parameters. Consider the 
family D = (S, so, K', B), where S = {s0, 51, 52,t, f}, the parameters are K’ = 
{X,Y,T’, F'} with domains Vx = {51,52}, Vy = {t, f}, Vr = {t}, Ver = {f}, 
and a family B of transition probability functions defined in Fig. 3 (left). As the 
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B(so) = [X = 1}, 
B(s1) = [T’ > 0.6, Y => 0.2, F’ > 0.2], 
B(s2) = [T’ > 0.2, Y => 0.2, F’ > 0.6], 
)= 
)= 


B(t) = [T"> 1], 
B(f) = [F= 1] 


Fig. 3. A family D of four Markov chains (unreachable states are grayed out). 


parameters T’ and F” each can take only one value, we consider K = {X,Y} 
as the set of parameters. There are |Vx| x |Vy| = 4 family members, depicted 
in Fig. 3(right). For conciseness, we omit some of the transition probabilities 
(recall that transition probabilities sum to one). Only realization rz satisfies the 
safety property y = P<o.3[O{t}]. 


CEGIS [9] illustrated: Consider running CEGIS, and assume the oracle gets 
realization ro first. A model checker reveals P[D,,, so H| OT] = 0.8 > 0.3. The 
CE for D,, and y contains the (only) path to the target: so—>sı—> t having 
probability 0.8 > 0.3. The corresponding CE C = {s9, s1,t} induces the conflict 
Ko = {X,Y}. None of the parameters is generalized. The same argument applies 
to any subsequent realization: the constructed CEs do not allow for generalization, 
the oracle returns only the passed realization, and the learner keeps iterating 
until accidentally guessing r3. 


Can we do better? To answer this, consider CE generation as a game: The 
Pruner creates a critical subsystem C. The Adversary wins if it finds a MC 
satisfying y containing C, thus refuting that C is a counterexample. In our 
setting, we change the game: The Adversary must select a family member rather 
than an arbitrary MC. Analogously, off-the-shelf CE generators construct a 
critical subsystem C that for every possible extension of C is a CE. These 
are CEs without context. In our game, the Adversary may not extend the MC 
arbitrarily, but must choose a family member. These are CEs modulo a family. 


Back to the example: Observe that for a CE for D,,, we could omit states t 
and sı from the set C of critical states: we know for sure that, once D,, takes 
transition (so, s1), it will reach target state t with probability at least 0.6. This 
exceeds the threshold 0.3, regardless of the value of the parameter Y. Hence, for 
family D, the set C” = {so} is a critical subsystem. The immediate advantage is 
that this set induces conflict Kor = {X} (parameter Y has been generalized). 
This enables us to reject all realizations from the set roọfįKo = {ro, r1}. It is 
‘easier’ to construct a CE for a (sub)family than for arbitrary MCs. More generally, 
a successful oracle needs to have access to useful bounds, and effectively integrate 
them in the CE generation. 
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Counterexample construction We develop an algorithm using bounds on 
reachability probabilities, similar to the bounds used above. Let us assume that for 
some set of realizations R and for every state s, we have bounds 1b” (s), ub" (s), 
such that for every r € R we have 1b? (s) < P[D,,s H OT] < ub” (s). Such 
bounds always exist (take 0 and 1). We see later how we compute these bounds. 
In what follows, we fix r and denote D, = (S, so, P). Let us assume D, violates 
a safety property y = P<\[0T]. The following definition is central: 


Definition 6 (Rerouting). Let MC D = (S, so, P) with st,5, €S,CCS 
a set of expanded states and y: S \ C — [0,1] a rerouting vector. The rerouting 
of MC D w.r.t. C and y is the MC DLCly] = (SU {81,8}, s0, PS) with: 


P(s) ifs EC, 
PY(s) = 4 [st + Y(s),81 4 (1-7(s))] fs € S\C, 
[sto 1] ifs €{st,s }. 


Essentially, D|C[y] extends the MC D with additional sink states st and sı 
and replaces all outgoing transitions of any non-expanded state s € S\C by 
a transition leading to s+ (with probability y(s)) and a complementary one to s1. 
We consider s+ to be the new target and let y’ denote the updated property. The 


transition s me st may be considered a ‘shortcut’ that by-passes successors of 
s and leads straight to target st with probability -y(s). To ensure that D{C/y] 
is a CE, the value (s) must be a lower bound on the reachability probability 
from s in D. When constructing a CE for a singular MC, we pick y = 0, whereas 
when this MC is induced by a realization r € R, we can safely pick y = Ib®. The 
CE will be valid for every r’ € R. It is a CE-modulo-R. 

Algorithmically, we employ a state-exploration approach and therefore start 
with C = Q, i.e., all states are initially rerouted. If this is a CE, we are 
done. Otherwise, if the rerouting D|C [y] satisfies y’, then we ‘expand’ some 
states to obtain a CE. Naturally, we must expand reachable states to change the 
satisfaction of y. By expanding some state s € S, we abandon the abstraction 


associated with the shortcut s ae), sy and replace it with concrete behavior that 
was inherent to state s in MC D. Expanding a state cannot decrease the induced 
reachability probability as Ib" is a valid lower bound. This gradual expansion 
of the reachable state space continues until for some C C S the corresponding 
rerouting D|C|y] violates y’. This gradual expansion process terminates as 
DJS{y| = D and our assumption is D jÆ p. We show this process on an example. 


Example 1. Reconsider D in Fig. 3 with y = P<o.3[O{t}]. Using the method 
outlined below we get: Ib? = [so > 0.2, 51 > 0.6, s2 > 0.2,t > 1, f > 0]. In 
absence of any bounds, the CE is {sọ,s1,t}. Consider the gradual rerouting 
approach: We set y = 1b", C© = and have D® := D,,,|C [y], see Fig. 4(a). 
Verifying this MC against y’ = P<o.3[0TU{st}] yields P[D©, so = OTU{s7}H = 
(so) = 0.2 < 0.3, i.e., the set CC®) is not a CE. We now expand the initial state, 
i.e., CY = {so} and let DY = D,,lC [y], see Fig. 4(b). Verifying D™ yields 
P[D®, so = OT U{st}] = 1- (s1) = 0.6 > 0.3. Thus, the set C® is critical 
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Fig. 4. Finding a CE to D,, and y from Fig. 3 using the rerouting vector y = 1b”. 


Algorithm 1: Counterexample construction based on rerouting. 
Input :An MC D, a property y = Preal[OT] s.t. Dr Æ p, a rerouting vector y. 
Output: A conflict K for D, and y. 
i+0, KM 
while true do 
C®,H® + reachableViaHoles(D,, K™) 
DË —D,LE[y] 
if P[D® H OT U {st}] %4 A then return K™; 
3 < chooseToExpand(H™, K) 
K“) = K© Usupp(B(3)) 
a+t+1 
end while 


CMAN DA FB WN eH 


and the corresponding conflict is Kea) = supp(so) = {X}. This is smaller than 
the naively computed conflict {X,Y}. 


Greedy state expansion strategy Recall from Fig. 2 that for an MC D, 
with D, A p, multiple CEs may exist inducing different conflicts. An efficient 
expansion strategy should yield a CE that induces a small amount of relevant 
parameters (to prune more family members) and this CE is preferably obtained 
by a small number of model-checking queries. The method presented in Alg. 1 
meets these criteria. The algorithm expands multiple states between subsequent 
model checks, while expanding only states that are associated with parameters 
that are relevant. In particular, in each iteration, we keep track of the set K 
of relevant parameters optimistically starting with K = Ø. We compute (see 
line 3) the set C® of states that are reachable from the initial state via states 
which are associated only with relevant parameters in KČ, i.e., via states for 
which supp(B(s)) C K™. Here, H represents a state exploration ‘horizon’: the 
set of states reachable from CC®) but containing some (still) irrelevant parameters. 
We then construct the corresponding rerouting D|C™ [y] and check whether it is 
a CE. Otherwise, we greedily choose a state 3 from the horizon H® containing 
the least number of irrelevant parameters and add these parameters to our 
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D,® R D, 
| R'CR | r € R+bounds | 
Abstr-Oracle Co y Learner Zooo CE-Oracle 
| bounds or R’ violates I R' CR violate & ] 
each r € R', r H @ no r |= 8 rē 


Fig. 5. Conceptual hybrid (dual-oracle) synthesis 


conflict (see line 7). The resulting conflict may not be minimal, but is computed 
fast. Our algorithm applies to probabilistic liveness properties? too using y = ub”. 


Computing bounds We compute Jb® and ub™ using an abstraction [10]. The 
method considers some set R of realizations and computes the corresponding 
quotient Markov decision process (MDP) that over-approximates the behavior of 
all MCs in the family R. Model checking this MDP yields an upper and a lower 
bound of the induced probabilities for all states over all realizations in R. That 
is, Bound(D, R) computes Ib® € RS and ub? € RS such that for each s € S: 


Ib? (s) < minP[D,,s KOT] < maxP[D,,s H OT] < ub? (s). 
rER rER 


To allow for refinement, two properties are crucial (with point-wise inequalities): 
1. Ib® < IÞF' A ub? > ub? for R'CR and 2. Ib? = ub'"} forr eR. 


In [10], the abstraction and refinement together define an abstraction-refinement 
loop (AR) that addresses the feasibility problem. In the worst case, this loop 
analyses 2- |R| quotient MDPs, which (as of now) may be arbitrarily larger than 
the number of family members they represent. 


5 Hybrid Dual-Oracle Synthesis 


We introduce an extended synthesis loop in which the abstraction-based reasoning 
is used to prune the family R, and to accelerate the CE-based oracle from Sect. 4. 
The intuitive idea is outlined in Fig. 5. Note that if the CE-based oracle is not 
exploited, we emulate AR (explained in computing bounds above), whereas if 
the abstraction oracle is not used, we emulate CEGIS (with the novel oracle). 
Let us motivate combining these oracles in a flexible way. The naive version 
outlined in the previous section assumed a single abstraction step, and invokes 
CEGIS with the bounds obtained from that step. Evidently, the better (tighter) 
the bounds y, the better the CEs. However, the abstraction-based bounds for R 
may be very loose. These bounds can be improved by splitting the set R and 
using the bounds on the two sub-families. The idea is to run a limited number of 


2 Some care is required regarding loops, see [9]. 
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Algorithm 2: Hybrid (dual-oracle) synthesis. 
Input :A family D, a reachability property y. 
Output: Either a member r in D with r |= y, or no such r exists in D 
R {RP}; // each analysed (sub-)family also holds bounds 
dOcecis <1; // time allocation factor for CEGIS 
while true do 

result, R , oar, tar <AR.run(R, p) 

if result.decided() then return result; 

CEGIS.setTimeout(tar : dczars) 

result, ocnars,R + CEGIS.run(R , p) 

if result.decided() then return result; 

ÔCEGIS + OCEGIS/OAR 

RER” 


end while 


ooN DA ONB 


m. e 
e O 


AR steps and then invoke CEGIS. Our experiments reveal that it can be crucial 
to be adaptive, i.e., the integrated method must be able to detect at run time 
when to switch. 


The proposed hybrid method switches between AR and CEGIS, where we 
allow for refining during the AR phase and use the obtained refined bounds 
during CEGIS. Additionally, we estimate the efficiency o (e.g., the number of 
pruned MCs per time unit) of the two methods and allocate more time t to the 
method with superior performance. That is, if we detect that CEGIS prunes 
sub-families twice as fast as AR, we double the time in the next round for 
CEGIS. The resulting algorithm is summarized in Alg. 2. Recall that AR (at 
line 5) takes one family from R, either solves it or splits it and returns the set 
of undecided families R’. In contrast, CEGIS processes multiple families from 
R until the timeout and then returns the set of undecided families R”. This 
workflow is motivated by the fact that one iteration of AR (i.e., the involved 
MDP model-checking) is typically significantly slower that one CEGIS iteration. 


Remark 1. Although the developed framework for integrated synthesis has been 
discussed in the context of feasibility with respect to a single property y, it 
can be easily generalized to handle multiple-property specifications as well as 
to treat optimal synthesis. Regarding multiple properties, the idea remains the 
same: Analyzing the quotient MDP with respect to multiple properties yields 
multiple probability bounds. After initiating a CEGIS-loop and obtaining an 
unsatisfiable realization, we can construct a separate conflict for each unsatisfied 
property, while using the corresponding probability bound to enhance the CE 
generation process. Optimal synthesis is handled similarly to feasibility, but, after 
obtaining a satisfiable solution, we update the optimizing property to exclude this 
solution: e.g., for maximal synthesis this translates to increasing the threshold of 
the maximizing property. Having exhausted the search space of family members, 
the last obtained solution is declared to be the optimal one. 
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model]||K|||R”||MDP sizeļavg. MC size||model | |[IR?| [MDP sizeļavg. MC size 


Grid | 8 | 65k | 11.5k 1.2k Pole 17 |1.3M| 6.6k 5.6k 
Maze | 20 | 1M 9k 5.4k Herman | 8 | 0.5k 48k 5.2k 
DPM |16 |43M| 9.5k 2.2k Herman*| 7 |3.1M 6k 1k 


Table 1. Summary of the benchmarks and their statistics 
6 Experimental evaluation 


Implementation. We implemented the hybrid oracle on top of the probabilistic 
model checker Storm [18]. While the high-performance parts were implemented 
in C++, we used a python API to flexibly construct the overall synthesis loop. 
For SMT solving, we used Z3 [29]. The tool chain takes a PRISM [27] or JANI [6] 
sketch and a set of temporal properties, and returns a satisfying realization, if 
such exists, or outputs that such realization does not exist. The implementation 
in the form of an artefact is available at https: //zenodo.org/record/4422543. 


Set-up. We compare the adaptive oracle-guided synthesis with two state-of-the-art 
synthesis methods: program-level CEGIS [9] using a MaxSat CE generation [16,41] 
and AR [10]. These use the same architecture and data structures from Storm. 
All experiments are run on an Ubuntu 19.04 machine with Intel i5-8300H (4 
cores at 2.3 GHz) and using up to 8 GB RAM, with all the algorithms being 
executed on a single thread. The benchmarks consists of five different models, 
see Table 1, from various domains that were used in [9,10]. As opposed to the 
benchmark considered in [9,10], we use larger variants of Grid and Herman to 
better demonstrate differences in the performance of individual methods. 

To investigate the scalability of the methods, we consider a new variant of the 
Herman model, that allows us to scale the number of randomization strategies 
and thus the family size. In particular, we will compare performance on two 
instances of different sizes: small Herman* (5k members) and large Herman* 
(3.1M members, other statistics are reported in Table 1). 

To reason about the pruning efficiency of different synthesis methods, we 
want to avoid feasible synthesis problems, where the order of family exploration 
can lead to inconsistent performance. Instead, we will primarily focus on non- 
feasible problems, where all realizations need to be explored in order to prove 
unsatisfiability. The experimental evaluation is presented in three parts. (1) We 
evaluate the novel CE construction method and compare it with the MaxSat-based 
oracle from [9]. (2) We compare the hybrid synthesis loop with the two baselines 
AR and CEGIS. (3) We consider novel hard synthesis instances (multi-property 
synthesis, finding optimal programs) on instances of the model Herman’. 


Comparing CE construction methods We consider the quality of the CEs 
and their generation time. In particular, we want to investigate (1) whether using 
CEs-modulo-families yields better CEes, (2) how the quality of CEs from the smart 
oracle compares to the MaxSat-based oracle, and how their time consumption 
compares. As a measure of quality of a CE, the average number of its relevant 
parameters w.r.t. the total number of its parameters is taken. That is, smaller 


204 R. Andriushchenko et. al. 


CE quality performance 
model M state expansion (new) |CEGIS [9]| AR [10] | Hybrid (new) 
axSat [16 a AR oe : : 

trivial non-trivial|iters| time |iters|time iters time 

Grid 0.59 (0.025) 10.50 (0.001 0.50 613 | 30 [5325| 486} (285,11) | 6 
*|0.74 (0.026) 10.65 (0.001 0.65 1801| 93 |6139] 540 |(2100, 127)| 33 

Maze 0.21 (0.247) 10.55 (0.009 0.38 290 |5449| 49 | 17 | (105,13) | 7 
*|0.24 (2.595) /0.63 (0.012 0.46 301 |6069] 63 | 26 | (146,17) | 9 

DPM 0.32 (0.447) |0.61 (0.007 0.53 2906] 2488] 299 | 25 | (631, 143) | 23 
*|0.33 (0.525) 10.49 (0.006 0.42 3172| 2782/1215] 81 |(2374, 545)) 76 

Pole - 0.87 (0.062 0.16 - - | 309] 12 (3, 5) 1 
- 0.54 (0.041 0.29 - - {615| 23 | (80, 61) 6 

oe - 0.91 (0.011 0.50 - - | 171] 86 (24, 1) 9 
* - 0.88 (0.016 0.87 - - | 643 | 269] (485, 13) | 29 


Table 2. CE quality for different methods and performance of three synthesis methods. 
For each model/property, we report results for two different thresholds where the 
symbol ‘x’ marks the one closer to the feasibility threshold, representing the more 
difficult synthesis problem. Symbol ‘-’ marks a two-hour timeout. CE quality: The 
presented numbers give the CE quality (i.e., the smaller, the better). The numbers in 
parentheses represent the average run-time of constructing one CE in seconds (run-times 
for constructing CE using non-trivial bounds are similar as for trivial ones and are thus 
not reported). Performance: for each method, we report the number of iterations (for 
the hybrid method, the reported values are iterations of the CEGIS and AR oracle, 
respectively) and the run-time in seconds. 


ratios imply better CEs. To measure the influence of using CEs-modulo-families, 
two types of bounds are used: (i) trivial bounds (i.e., y = 0 for safety and y = 1 
for liveness properties), and (ii) non-trivial bounds corresponding to the entire 
family RP representing the most conservative estimate. The results are reported 
in (the left part of) Table 2. In the next subsection, we investigate this same 
benchmark from the point of view of the performance of the synthesis methods, 
which also shows the immediate effect of the new CE generation strategy. 


The first observation is that using non-trivial bounds (as opposed to trivial 
ones) for the state expansion approach can drastically decrease the conflict 
size. It turns out that the CEs obtained using the greedy approach are mostly 
larger than those obtained with the MaxSat method. However (see Grid), even 
for trivial bounds, we may obtain smaller CEs than for MaxSat: computing 
a minimal-command CE does not necessarily induce an optimal conflict. On 
the other hand, comparing the run-times in the parentheses, one can see that 
computing CEs via the greedy state expansion is orders of magnitude faster than 
computing command-optimal ones using MaxSat. It is good to realize that the 
greedy method makes at most || model-checking queries to compute CEs, while 
the MaxSat method may make exponentially many such queries. Overall, the 
greedy method using the non-trivial bounds is able to obtain CEs of comparable 
quality as the MaxSat method, while being orders of magnitude faster. 
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Performance comparison with AR/CEGIS We compare the hybrid synthe- 
sis loop from Sect. 5 with two state-of-the-art baselines: CEGIS and AR. The 
results are displayed in (the right half of) Table 2. In all 10 cases, the hybrid 
method outperforms the baselines. It is up to an order of magnitude faster. 

Let us discuss the performance of the hybrid method. We classify benchmarks 
along two dimensions: (1) the performance of CEGIS and (2) the performance of 
AR. Based on the empirical performance, we classify (Grid) as good-for-CEGIS 
(and not for AR), Maze, Pole and DPM as good-for-AR (and not for CEGIS), 
and Herman as hard (for both). Roughly, AR works well when the quotient 
MDP does not blow up and its analysis is precise due to consistent schedulers, 
i.e., when the parameter dependencies are not crucial for a precise analysis. 
CEGIS performs well when the CEs are small and fast to compute. On the other 
hand, synthesis problems for which neither pure CEGIS nor pure AR are able to 
effectively reason about non-trivial subfamilies, inherently profit from a hybrid 
method. The main point we want to discuss is how the hybrid method reinforces 
the strengths of both methods, rather than their weaknesses. 

In the hybrid method, there are two factors that determine the efficiency: 
(i) how fast do we get bounds on the reachability probability that are tight enough to 
enable construction of good countereramples? and (ii) how good are the constructed 
counterecamples? The former factor is attributed to the proposed adaptive scheme 
(see Alg. 2), where the method will prefer AR-like analysis and continue refinement 
until the computed bounds allow construction of small counterexamples. The 
latter is reflected above. Let us now discuss how these two aspects are reflected 
in the benchmarks. 

In good-for-CEGIS benchmarks like Grid, after analyzing a quotient MDP 
for the whole family, the hybrid method mostly profits from better CEs yielding 
better bounds, thus outperforming CEGIS. Indeed, the CEs are found so fast 
that the bottleneck is no longer their generation. This also explains why the 
speedup is not immediately translated to the speedup on the overall synthesis 
loop. In the good-for-AR benchmark DPM, the hybrid method provides only a 
minor improvement as it has to perform a large number of AR-iterations before 
the novel CE-based pruning can be effectively used. This can be considered as the 
worst-case scenario for the hybrid method. On other good-for-AR benchmarks 
like Maze and Pole, the good performance on AR allows to quickly obtain tight 
bounds which can then be exploited by CEGIS. Finally, in hard models like 
Herman, abstraction-refinement is very expensive, but even the bounds from the 
first round yield bounds that, as opposed to the trivial bounds, now enable good 
CEs: CEGIS can keep using these bounds to quickly prune the state space. 


More complicated synthesis problems Our new approach can push the 
limits of synthesis benchmarks significantly. We illustrate this by considering a 
new variant of the Herman model, Herman*, and a property imposing an upper 
bound on the expected number of rounds until stabilization. We put this bound 
just below the optimal (i.e., the minimal) value, yielding a hard non-feasible 
problem. The synthesis results are summarized in Table 3. As CEGIS performs 
poorly on Herman, it is excluded here. 
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synthesis AR Hybrid synthesis AR Hybrid 
problem iters|time iters time problem iters|time iters | time 
feasibility 81 | 30s (274, 1)| 7s feasibility 69k | 47h |(14280, 2)/13.4m 


two properties] 97 | 38s |(274, 1)| 8s optimality 83k | 55h |(16197, 3)/16.8m 
optimality 531 |150s|(571, 7)| 12s 5%-optimality| 60k | 42h | (6421, 7) | 5.1m 


Table 3. The impact of scaling the family size (of the Herman* model) and handling 
more complex synthesis problems. The left part shows the results for the smaller variant 
(5k members), the right part for the larger one (3.1M members). 


First, we investigate on small Herman* how the methods can handle the 
synthesis for multi-property specifications. We add one feasible property to the 
(still non-feasible) specification (row ‘two properties’). While including more 
properties typically slows down the AR computation, the performance of the 
hybrid method is not affected as the corresponding overhead is mitigated by 
additional pruning opportunities. Second, we consider optimal synthesis for the 
property as used in the feasibility synthesis. The hybrid method requires only 
a minor overhead to find an optimal solution compared to checking feasibility. 
This overhead is significantly larger for AR. 

Next, we consider larger Herman* model having significantly more randomiza- 
tion strategies (3.1M members) that include solutions leading to a considerably 
faster stabilization. This model is out of reach for existing synthesis approaches: 
one-by-one enumeration takes more than 27 hours and the AR performs even 
worse—solving the feasibility and optimality problems requires 47 and 55 hours, 
respectively. On the other hand, the proposed hybrid method is able to solve 
these problems within minutes. Finally, we consider a relaxed variant of optimal 
synthesis (5%-optimality) guaranteeing that the found solution is up to 5% worse 
than the optimal. Relaxing the optimally criterion speeds up the hybrid synthesis 
method by about a factor three. 

These experiments clearly demonstrate that scaling up the synthesis problem 
several orders of magnitude renders existing synthesis methods infeasible: they 
need tens of hours to solve the synthesis problems. Meanwhile, the hybrid method 
tackles these difficult synthesis problems without significant penalty and is capable 
of producing a solution within minutes. 


7 Conclusion 


We present a novel method for the automated synthesis of probabilistic programs. 
Pairing the counterexample-guided inductive synthesis with the deductive oracle 
using an MDP abstraction, we develop a synthesis technique enabling faster 
construction of smaller counterexamples. Evaluating the method on case studies 
from different domains, we demonstrate that the novel CE construction and the 
adaptive strategy lead to a significant acceleration of the synthesis process. The 
proposed method is able to reduce the run-time for challenging problems from 
days to minutes. In our future work, we plan to investigate counterexamples on 
the quotient MDPs and improve the abstraction refinement strategy. 
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Abstract. Many probabilistic inference problems such as stochastic fil- 
tering or the computation of rare event probabilities require model anal- 
ysis under initial and terminal constraints. We propose a solution to 
this bridging problem for the widely used class of population-structured 
Markov jump processes. The method is based on a state-space lumping 
scheme that aggregates states in a grid structure. The resulting approxi- 
mate bridging distribution is used to iteratively refine relevant and trun- 
cate irrelevant parts of the state-space. This way, the algorithm learns 
a well-justified finite-state projection yielding guaranteed lower bounds 
for the system behavior under endpoint constraints. We demonstrate the 
method’s applicability to a wide range of problems such as Bayesian 
inference and the analysis of rare events. 


Keywords: Bayesian Inference - Bridging problem - Smoothing - Lump- 
ing - Rare Events. 


1 Introduction 


Discrete-valued continuous-time Markov Jump Processes (MJP) are widely used 
to model the time evolution of complex discrete phenomena in continuous time. 
Such problems naturally occur in a wide range of areas such as chemistry [I6], 
systems biology [49]46], epidemiology [36] as well as queuing systems [10] and 
finance [39]. In many applications, an MJP describes the stochastic interaction 
of populations of agents. The state variables are counts of individual entities of 
different populations. 

Many tasks, such as the analysis of rare events or the inference of agent 
counts under partial observations naturally introduce terminal constraints on 
the system. In these cases, the system’s initial state is known, as well as the 
system’s (partial) state at a later time-point. The probabilities corresponding 
to this so-called bridging problem are often referred to as bridging probabilities 
[79]. For instance, if the exact, full state of the process X, has been observed 
at time 0 and T, the bridging distribution is given by 


Pr(X; = z | Xo = 20, Xr = £g) 


© The Author(s) 2021 
J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 210-229, 2021. 
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for all states x and times ¢ € [0,7]. Often, the condition is more complex, such 
that in addition to an initial distribution, a terminal distribution is present. 
Such problems typically arise in a Bayesian setting, where the a priori behavior 
of a system is filtered such that the posterior behavior is compatible with noisy, 
partial observations [11]25]. For example, time-series data of protein levels is 
available while the mRNA concentration is not [125]. In such a scenario our 
method can be used to identify a good truncation to analyze the probabilities 
of mRNA levels. 

Bridging probabilities also appear in the context of rare events. Here, the rare 
event is the terminal constraint because we are only interested in paths contain- 
ing the event. Typically researchers have to resort to Monte-carlo simulations in 
combination with variance reduction techniques in such cases [L426]. 

Efficient numerical approaches that are not based on sampling or ad-hoc 
approximations have rarely been developed. 

Here, we combine state-of-the-art truncation strategies based on a forward 
analysis [28]4] with a refinement approach that starts from an abstract MJP with 
lumped states. We base this lumping on a grid-like partitioning of the state-space. 
Throughout a lumped state, we assume a uniform distribution that gives an 
efficient and convenient abstraction of the original MJP. Note that the lumping 
does not follow the classical paradigm of Markov chain lumpability or its 
variants [15]. Instead of an approximate block structure of the transition-matrix 
used in that context, we base our partitioning on a segmentation of the molecule 
counts. Moreover, during the iterative refinement of our abstraction, we identify 
those regions of the state-space that contribute most to the bridging distribution. 
In particular, we refine those lumped states that have a bridging probability 
above a certain threshold 6 and truncate all other macro-states. This way, the 
algorithm learns a truncation capturing most of the bridging probabilities. This 
truncation provides guaranteed lower bounds because it is at the granularity of 
the original model. 

In the rest of the paper, after presenting related work (Section |2) and back- 
ground (Section Bp, we discuss the method (Section [4p and several applications, 
including the computation of rare event probabilities as well as Bayesian smooth- 
ing and filtering (Section 5p. 


2 Related Work 


The problem of endpoint constrained analysis occurs in the context of Bayesian 
estimation . For population-structured MJPs, this problem has been ad- 
dressed by Huang et al. using moment closure approximations and by Wild- 
ner and Köppl [48] further employing variational inference. Golightly and Sher- 
lock modified stochastic simulation algorithms to approximatively augment gen- 
erated trajectories [I7]. Since a statistically exact augmentation is only possible 
for few simple cases, diffusion approximations [I8] and moment approximations 
have been employed. Such approximations, however, do not give any guaran- 
tees on the approximation error and may suffer from numerical instabilities [43]. 
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The bridging problem also arises during the estimation of first passage times 
and rare event analysis. Approaches for first-passage times are often of heuristic 
nature [42]22]8]. Rigorous approaches yielding guaranteed bounds are currently 
limited by the performance of state-of-the-art optimization software [6]. In bi- 
ological applications, rare events of interest are typically related to the reach- 
ability of certain thresholds on molecule counts or mode switching [45]. Most 
methods for the estimation of rare event probabilities rely on importance sam- 
pling [26]14]. For other queries, alternative variance reduction techniques such 
as control variates are available [5]. Apart from sampling-based approaches, dy- 
namic finite-state projections have been employed by Mikeev et al. [34], but are 
lacking automated truncation schemes. 

The analysis of countably infinite state-spaces is often handled by a pre- 
defined truncation [27]. Sophisticated state-space truncations for the (uncondi- 
tioned) forward analysis have been developed to give lower bounds and rely on a 
trade-off between computational load and tightness of the bound B728/4[24]37). 

Reachability analysis, which is relevant in the context of probabilistic veri- 
fication [838], is a bridging problem where the endpoint constraint is the visit 
of a set of goal states. Backward probabilities are commonly used to compute 
reachability likelihoods [2[50]. Approximate techniques for reachability, based 
on moment closure and stochastic approximation, have also been developed in 
[BD], but lack error guarantees. There is also a conceptual similarity between 
computing bridging probabilities and the forward-backward algorithm for com- 
puting state-wise posterior marginals in hidden Markov models (HMMs) [40]. 
Like MJPs, HMMs are a generative model that can be conditioned on obser- 
vations. We only consider two observations (initial and terminal state) that are 
not necessarily noisy but the forward and backward probabilities admit the same 
meaning. 


3 Preliminaries 


3.1 Markov Jump Processes with Population Structure 


A population-structured Markov jump process (MJP) describes the stochastic 
interactions among agents of distinct types in a well-stirred reactor. The assump- 
tion of all agents being equally distributed in space, allows to only keep track 
of the overall copy number of agents for each type. Therefore the state-space is 
S C N”S where ng denotes the number of agent types or populations. Interac- 
tions between agents are expressed as reactions. These reactions have associated 


gains and losses of agents, given by non-negative integer vectors vy and uf for 


reaction j, respectively. The overall effect is given by vj = up Ups A reaction 
between agents of types Sj,...,5;,, is specified in the following form: 


ns 


= aj (x) Ks 
Sue > Y vf Se. (1) 
l=1 


f=1 
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The propensity function a; gives the rate of the exponentially distributed firing 
time of the reaction as a function of the current system state x € S. In population 
models, mass-action propensities are most common. In this case the firing rate 
is given by the product of the number of reactant combinations in x and a rate 


constant cj, i.e. 
ns 
Te 
ai= oII( a (2) 


In this case, we give the rate constant in instead of the function aj. For a 
given set of npg reactions, we define a stochastic process {X;}:>0 describing the 
evolution of the population sizes over time t. Due to the assumption of exponen- 
tially distributed firing times, X is a continuous-time Markov chain (CTMC) on 
S with infinitesimal generator matrix Q, where the entries of Q are 


Seren Qj (x) ’ if x # Y, 
Qoy = 


— a1 a; (2), otherwise. 


(3) 


The probability distribution over time can be analyzed as an initial value prob- 
lem. Given an initial state xo, the distributiorf?] 
m(a;,t) = Pr(X; = 2; | Xo = to), t>0 (4) 


evolves according to the Kolmogorov forward equation 


d 
Salt) = (OQ, (5) 
where m(t) is an arbitrary vectorization (1(2x1,t),7(22,t),...,7(2)s|,t)) of the 


states. 
Let x, E S be a fixed goal state. Given the terminal constraint Pr( Xr = £g) 
for some T > 0, we are interested in the so-called backward probabilities 


B(ai,t) = Pr(Xr = tg | X% = 2:), t<T. (6) 


Note that 8(-,t) is a function of the conditional event and thus is no probability 
distribution over the state-space. Instead 8(-,t) gives the reaching probabilities 
for all states over the time span of |t, T]. To compute these probabilities, we can 
employ the Kolmogorov backward equation 


L alt) = QBOT, (7) 


where we use the same vectorization to construct (t) as we used for m(t). The 
above equation is integrated backwards in time and yields the reachability prob- 
ability for each state x; and time t < T of ending up in zg at time T. 


1 In the sequel, x; denotes a state with index i instead of its i-th component. 
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The state-space of many MJPs with population structure, even simple ones, 
is countably infinite. In this case, we have to truncate the state-space to a rea- 
sonable finite subset. The choice of this truncation heavily depends on the goal of 
the analysis. If one is interested in the most “common” behavior, for example, a 
dynamic mass-based truncation scheme is most appropriate [32]. Such a scheme 
truncates states with small probability during the numerical integration. How- 
ever, common mass-based truncation schemes are not as useful for the bridging 
problem. This is because trajectories that meet the specific terminal constraints 
can be far off the main bulk of the probability mass. We solve this problem by 
a state-space lumping in connection with an iterative refinement scheme. 

Consider as an example a birth-death process. This model can be used to 
model a wide variety of phenomena and often constitutes a sub-module of larger 
models. For example, it can be interpreted as an M/M/1 queue with service rates 
being linearly dependent on the queue length. Note, that even for this simple 
model, the state-space is countably infinite. 


Model 1 (Birth-Death Process). The model consists of exponentially dis- 
tributed arrivals and service times proportional to queue length. It can be ex- 
pressed using two mass-action reactions: 


ø% x and xe. 


The initial condition Xo = 0 holds with probability one. 


3.2 Bridging Distribution 


The process’ probability distribution given both initial and terminal constraints 
is formally described by the conditional probabilities 


y(x, t) = Pr(X; = zi | Xo = z0, Xr = £g), O<t<T (8) 


for fixed initial state x9 and terminal state xg. We call these probabilities the 
bridging probabilities. It is straight-forward to see that y admits the factorization 


(xi, t) = (xi, t)B (ai, t)/T(£g, T) (9) 


due to the Markov property. The normalization factor, given by the reachability 
probability m(xg, T) = 8(xo0,0), ensures that 7(-, ¢) is a distribution for all time 
points t € [0, T]. We call each q(-, t) a bridging distribution. From the Kolmogorov 
equations and we can obtain both the forward probabilities 7(-,t) and 
the backward probabilities 3(-,t) for t < T. 

We can easily extend this procedure to deal with hitting times constrained 
by a finite time-horizon by making the goal state x, absorbing. 

In Figure |1| we plot the forward, backward, and bridging probabilities for 
Modelfl| The probabilities are computed on a [0, 100] state-space truncation. The 
approximate forward solution 7 shows how the probability mass drifts upwards 
towards the stationary distribution Poisson(100). The backward probabilities 
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forward probs. ñ backward probs. B bridging probs. y 


#X 


Fig. 1. Forward, backward, and bridging probabilities for Model [I] with initial con- 
straint Xo = 0 and terminal constraint X19 = 40 on a truncated state-space. Proba- 
bilities over 0.1 in # and ĝ are given full intensity for visual clarity. The lightly shaded 
area (> 60) indicates a region being more relevant for the forward than for the bridging 
probabilities. 


are highest for states below the goal state x, = 40. This is expected because 
upwards drift makes reaching x, more probable for “lower” states. Finally, the 
approximate bridging distribution Ẹ can be recognized to be proportional to the 
product of forward 7 and backward probabilities £. 


4 Bridge Truncation via Lumping Approximations 


We first discuss the truncation of countably infinite state-spaces to analyze back- 
ward and forward probabilities (Section|4.1). To identify effective truncations we 
employ a lumping scheme. In Section we explain the construction of macro- 
states and assumptions made, as well as the efficient calculation of transition 
rates between them. Finally, in Section [4.3] we present an iterative refinement 
algorithm yielding a suitable truncation for the bridging problem. 


4.1 Finite State Projection 


Even in simple models such as a birth-death Process (Model|1), the reachable 
state-space is countably infinite. Direct analyzes of backward (6) and forward 
equations are often infeasible. Instead, the integration of these differential 
equations requires working with a finite subset of the infinite state-space [37]. If 
states are truncated, their incoming transitions from states that are not trun- 
cated can be re-directed to a sink state. The accumulated probability in this 
sink state is then used as an error estimate for the forward integration scheme. 
Consequently, many truncation schemes, such as dynamic truncations [4], aim 
to minimize the amount of “lost mass” of the forward probability. We use the 
same truncation method but base the truncation on bridging probabilities rather 
than the forward probabilities. 


4.2 State-Space Lumping 


When dealing with bridging problems, the most likely trajectories from the initial 
to the terminal state are typically not known a priori. Especially if the event in 
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question is rare, obtaining a state-space truncation adapted to its constraints is 
difficult. We devise a lumping scheme that groups nearby states, i.e. molecule 
counts, into larger macro-states. A macro-state is a collection of states treated 
as one state in a lumped model, which can be seen as an abstraction of the 
original model. These macro-states form a partitioning of the state-space. In this 
lumped model, we assume a uniform distribution over the constituent micro- 
states inside each macro-state. Thus, given that the system is in a particular 
macro-state, all of its micro-states are equally likely. This partitioning allows us 
to analyze significant regions of the state-space efficiently albeit under a rough 
approximation of the dynamics. Iterative refinement of the state-space after each 
analysis moves the dynamics closer to the original model. In the final step of the 
iteration, the considered system states are at the granularity of the original model 
such that no approximation error is introduced by assumptions of the lumping 
scheme. Computational efficiency is retained by truncating in each iteration 
step those states that contribute little probability mass to the (approximated) 
bridging distributions. 

We choose a lumping scheme based on a grid of hypercube macro-states whose 
endpoints belong to a predefined grid. This topology makes the computation 
of transition rates between macro-states particularly convenient. Mass-action 
reaction rates, for example, can be given in a closed-form due to the Faulhaber 
formulae. More complicated rate functions such as Hill functions can often be 
handled as well by taking appropriate integrals. 

Our choice is a scheme that uses ng-dimensional hypercubes. A macro-state 
#,(l,u™) (denoted by Z; for notational ease) can therefore be described by 
two vectors € and u™. The vector € gives the corner closest to the origin, 
while u™ gives the corner farthest from the origin. Formally, 


Zi = £,(€,u) = {x EN” | 9 <a <u}, (10) 


where ’<’ stands for the element-wise comparison. This choice of topology makes 
the computation of transition rates between macro-states particularly conve- 
nient: Suppose we are interested in the set of micro-states in macro-state g; that 
can transition to macro-state Z;, via reaction 7. It is easy to see that this set is 
itself an interval-defined macro-state % ; 3 To compute this macro-state we can 


a 
simply shift z; by vj, take the intersection with Z, and project this set back. 
Formally, 


gav 
where the additions are applied element-wise to all states making up the macro- 
states. For the correct handling of the truncation it is useful to define a general 
exit state 


= ((Z; +0) NO Zk) — vj, (11) 


E a = (Fi + v) \ Zi) = 0y: (12) 


This state captures all micro-states inside g; that can leave the state via reaction 
j. Note that all operations preserve the structure of a macro-state as defined in 
{0}. Since a macro-state is based on intervals the computation of the transition 
rate is often straight-forward. Under the assumption of polynomial rates, as 
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original version terminal distribution 


— original 
0.035 4 —— lumped 


lumped version 


T T T T T 
0 25 50 75 100 125 150 175 200 


Fig. 2. A lumping approximation of Model [I] on the state-space truncation to [0, 200] 
on t € [0,50]. On the left-hand side solutions of a regular truncation approximation 
and a lumped truncation (macro-state size is 5) are given. On the right-hand side the 
respective terminal distributions Pr(X50 = x;) are contrasted. 


it is the case for mass-action systems, we can compute the sum of rates over 
this transition set efficiently using Faulhaber’s formula. We define the lumped 


transition function 
a;(#) = $ a;(2) (13) 


for macro-state z and reaction j. As an example consider the following mass- 
action reaction 2X “+ Ø. For macro-state = {0,...,n} we can compute the 
corresponding lumped transition rate 


et. et no 6 (nF 4+3n?4+n nr? tn 
(Z)=5 di j= dG = 5 2 


i=1 i=l 


QI 


eliminating the explicit summation in the lumped propensity function. 

For polynomial propensity functions a such formulae are easily obtained au- 
tomatically. For non-polynomial propensity functions, we can use the continuous 
integral as an approximation. This is demonstrated on a case study in Section.2] 


Using the transition set computation and the lumped propensity func- 
tion we can populate the Q-matrix of the finite lumping approximation: 


i yi a, (z; ) jvol (z), if Zi A Ep 
Qa: ze = a i (14) 
— Da a; (z) /vol(%;), otherwise 


In addition to the lumped rate function over the transition state % iip YE need to 


divide by the total volume of the lumped state z;. This is due to the assumption 
of a uniform distribution inside the macro-states. Using this Q-matrix, we can 
compute the forward and backward solution using the respective Kolmogorov 
equations and (7p. 

Interestingly, the lumped distribution tends to be less concentrated. This is 
due to the assumption of a uniform distribution inside macro-states. This effect 
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is illustrated by the example of a birth-death process in Figure |2| Due to this 
effect, an iterative refinement typically keeps an over-approximation in terms of 
state-space area. This is a desirable feature since relevant regions are less likely 
to be pruned due to lumping approximations. 


4.3 Iterative Refinement Algorithm 


The iterative refinement algorithm (Alg.[I} starts with a set of large macro-states 
that are iteratively refined, based on approximate solutions to the bridging prob- 
lem. We start by constructing square macro-states of size 2” in each dimension 
for some m € N such that they form a large-scale grid S©). Hence, each initial 
macro-state has a volume of (2”)"*. This choice of grid size is convenient be- 
cause we can halve states in each dimension. Moreover, this choice ensures that 
all states have equal volume and we end up with states of volume 2° = 1 which 
is equivalent to a truncation of the original non-lumped state-space. 

An iteration of the state-space refinement starts by computing as the T. 
ward and backward probabilities ( (lines |2] Plana [B] via integration of (5) and 
respectively, using the lumped Q-matrix. Based on the resulting Sade 
forward and backward probabilities, we compute an approximation of the bridg- 
ing distributions (line fl}. This is done for each time-point in an equispaced grid 

n [0,7]. The time grid granularity is a hyper-parameter of the algorithm. If 
the grid is too fine, the memory overhead of storing backward Bo and forward 
solutions # increases] If, on the other hand, the granularity is too low, too 
much of the state-space might be truncated. Based on a threshold parameter 
ô > 0 states are either removed or split (line[7), depending on the mass assigned 
to them by the approximate bridging probabilities AO, A state can be split by 
the split-function which halves the state in each dimension. Otherwise, it is 
removed. Thus, each macro-state is either split into 2”S new states or removed 
entirely. The result forms the next lumped state-space S@+)), The Q-matrix is 
adjusted (line such that transition rates for S“*) are calculated accord- 
ing to (4p. Entries of truncated states are removed from the transition matrix. 
Transitions leading to them are re-directed to a sink state (see Section |4.1). Af- 
ter m iterations (we started with states of side lengths 2™) we have a standard 
finite state projection scheme on the original model tailored to computing an 
approximation of the bridging distribution. 

In Figure [3] we give a demonstration of how Algorithm [I] works to refine the 
state-space iteratively. Starting with an initial lumped state-space S©) covering 
a large area of the state-space, repeated evaluations of the bridging distributions 
are performed. After five iterations the remaining truncation includes all states 
that significantly contribute to the bridging probabilities over the times [0, T]. 

It is important to realize that determining the most relevant states is the 
main challenge. The above algorithm solves this problem by considering only 


? We denote the approximations with a hat (e.g. 7) rather than a bar (e.g. 7) to 
indicate that not only the lumping approximation but also a truncation is applied 
and similarly for the Q-matrix. 
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Algorithm 1: Iterative refinement for the bridging problem 


input : Initial partitioning S, truncation threshold ô 
output: approximate bridging distribution + 


1 fori=1,...,mdo 
2 it AO + salve approximate forward equation on S“; 
3 a ) & solve approximate backward equation on S“’ G ), 
4 si Ye Bo O ileg, T); /* approximate bridging distribution */ 
5 Sn +O; 
6 foreach z € S® do 
7 if at AO (z )>06; /* refine based on bridging probabilities */ 
8 then 
9 E SCD sl) U split(z); 
10 update Q-matrix; 


11 return 4, 


#states=101 #states=93 #states=245 #states=613 #states=1477 


Fig. 3. The state-space refinement algorithm on two parallel unit-rate arrival processes. 
The bridging problem from (0,0) to (64,64) and T = 10 and truncation threshold 
6 = 5e-3. States with a bridging probability below 6 are light grey. The macro-state 
containing the goal state is marked in black. The initial macro-states are of size 16 x 16. 


those parts of the state-space that contribute most to the bridging probabilities. 
The truncation is tailored to this condition and might ignore regions that are 
likely in the unconditioned case. For instance, in Fig. [1] the bridging probabili- 
ties mostly remain below a population threshold of ##X = 60 (as indicated by 
the lighter/darker coloring), while the forward probabilities mostly exceed this 
bound. Hence, in this example a significant portion of the forward probabilities 


a is captured by the sink state. However, the condition in line 7lin Algorithm |1| 
ensures that states contributing significantly to 4, () 


the next iteration. 


will be kept and refined in 


5 Results 


We present four examples in this section to evaluate our proposed method. 
A prototype was implemented in Python 3.8. For numerical integration we 
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threshold 6 le-2 le-3 le-4 le-5 
truncation size 1154 2354 3170 3898 
overall states 2074 3546 4586 5450 
estimate 8.8851e-30 1.8557e-29 1.8625e-29 1.8625e-29 
rel. error 5.2297e-01 3.6667e-03 3.7423¢-05 9.5259e-08 


Table 1. Estimated reachability probabilities based on varying truncation thresholds 
6: The true probability is 1.8625e-29. We also report the size of the final truncation and 
the accumulated size of all truncations during refinement iterations (overall states). 


used the Scipy implementation [47] of the implicit method based on backward- 
differentiation formulas [13]. The analysis as a Jupyter notebook is made avail- 
able onlind?] 


5.1 Bounding Rare Event Probabilities 


We consider a simple model of two parallel Poisson processes describing the 
production of two types of agents. The corresponding probability distribution 
has Poisson product form at all time points t > 0 and hence we can compare 
the accuracy of our numerical results with the exact analytic solution. We use 
the proposed approach to compute lower bounds for rare event probabilities. 


Model 2 (Parallel Poisson Processes). The model consists of two parallel 
independent Poisson processes with unit rates. 


o>»A and øB. 


The initial condition Xo = (0,0) holds with probability one. After t time units 
each species abundance is Poisson distributed with rate A = t. 


We consider the final constraint of reaching a state where both processes exceed 
a threshold of 64 at time 20. Without prior knowledge, a reasonable truncation 
would have been 160 x 160. But our analysis shows that just 20% of the states are 
necessary to capture over 99.6% of the probability mass reaching the target event 
(cf. Table E). Decreasing the threshold 6 leads to a larger set of states retained 
after truncation as more of the bridging distribution is included (cf. Figure [4}. 
We observe an increase in truncation size that is approximately logarithmic in 6, 
which, in this example, indicates robustness of the method with respect to the 
choice of 6. 


3 https: //www.github.com/mbackenkoehler/mjp_bridging 


hese bounds are rigorous up to the approximation error of the numerical inte- 
gration scheme. However, the forward solution could be replaced by an adaptive 
uniformization approach [3] for a more rigorous integration error control. 
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Fig. 4. State-space truncation for varying values of the threshold parameter ô: Two 
parallel Poisson processes under terminal constraints x) > 64 and XE ) > 64. The 
initial macro-states are 16 x 16 such that the final states are regular micro states. 


Comparison to other methods The truncation approach that we apply is similar 
to the one used by Mikeev et al. for rare event estimation. However, they used 
a given linearly biased MJP model to obtain a truncation. A general strategy 
to compute an appropriate biasing was not proposed. It is possible to adapt 
our truncation approach to the dynamic scheme in Ref. [84] where states are 
removed in an on-the-fly fashion during numerical integration. 

A finite state-space truncation covering the same area as the initial lumping 
approximation would contain 25,600 states[] The standard approach would be 
to build up the entire state-space for such a model [27]. Even using a conser- 
vative truncation threshold 6 = le-5, our method yields an accurate estimate 
using only about a fifth (5450) of this accumulated over all intermediate lumped 
approximations. 


5.2 Mode Switching 


Mode switching occurs in models exhibiting multi-modal behavior [44] when a 
trajectory traverses a potential barrier from one mode to another. Often, mode 
switching is a rare event and occurs in the context of gene regulatory networks 
where a mode is characterized by the set of genes being currently active [30]. 
Similar dynamics also commonly occur in queuing models where a system may 
for example switch its operating behavior stochastically if traffic increases above 
or decreases below certain thresholds. Using the presented method, we can get 
both a qualitative and quantitative understanding of switching behavior without 
resorting to Monte-Carlo methods such as importance sampling. 


Exclusive Switch The exclusive switch [7] has three different modes of opera- 
tion, depending on the DNA state, i.e. on whether a protein of type one or two 
is bound to the DNA. 


5 Here, the goal is not treated as a single state. Otherwise, it consists of 24,130 states. 
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Model 3 (Exclusive Switch). The exclusive switch model consists of a pro- 
moter region that can express both proteins P) and P>. Both can bind to the 
region, suppressing the expression of the other protein. For certain parameteri- 
zations, this leads to a bi-modal or even tri-modal behavior. 


D4p4PR, D8D+BRo RSS BAs 


D+ P, $ D.P, D.P1D+RP D.P, S D.P, +P; 


D+P: DP DPISD4R DP Š D.P +P 
The parameter values are p = 1e-1, A = 1e-3, 8 = 1e-2, y = 8e-3, anda = 1e-1. 


Since we know a priori of the three distinct operating modes, we adjust the 
method slightly: The state-space for the DNA states is not lumped. Instead 
we “stack” lumped approximations of the P-P) phase space upon each other. 
Special treatment of DNA states is common for such models [28]. 

To analyze the switching, we choose the transition from (variable order: P}, 
Po, D, D.P,, D.P2) x, = (32,0,0,0,1) to v2 = (0,32,0,1,0) over the time 
interval t € [0,10]. The initial lumping scheme covers up to 80 molecules of P, 
and P, for each mode. Macro-states have size 8 x 8 and the truncation threshold 
is 6 = le-4. 

In the analysis of biological switches, not only the switching probability but 
also the switching dynamics is a central part of understanding the underlying 
biological mechanisms. In Figure |5| (left), we therefore plot the time-varying 
probabilities of the gene state conditioned on the mode. We observe a rapid un- 
binding of P», followed by a slow increase of the binding probability for P,. These 
dynamics are already qualitatively captured by the first lumped approximation 
(dashed lines). 


Toggle Switch Next, we apply our method to a toggle switch model exhibiting 
non-polynomial rate functions. This well-known model considers two proteins A 
and B inhibiting the production of the respective other protein [29]. 


Model 4. Toggle Switch (Hill functions) We have population types A and B 
with the following reactions and reaction rates. 


goa, where a(x) = 4 ; AS 
1+2p 

goo, where a(x) = 4 f BØ 
1+2,4 


The parameterization is p = 10, A = 0.1. 


Due to the non-polynomial rate functions a; and az, the transition rates between 
macro-states are approximated by using the continuous integral 


b+0.5 y 
alz) & I Ta dx = p (log (b + 1.5) — log (a + 0.5)) 
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Fig. 5. (left) Mode probabilities of the exclusive switch bridging problem over time for 
the first lumped approximation (dashed lines) and the final approximation (solid lines) 
with constraints Xo = (32,0,0,1,0) and X10 = (0,32,0,0,1). (right) The expected 
occupation time (excluding initial and terminal states) for the switching problem of 
the toggle switch using Hill-type functions. The bridging problem is from initial (0, 120) 
to a first passage of (120, 0) in t € [0, 10]. 


for a macro-state = {a,...,b}. 

We analyze the switching scenario from (0,120) to the first visit of state 
(120, 0) up to time T = 10. The initial lumping scheme covers up to 352 molecules 
of A and B and macro-states have size 32 x 32. The truncation threshold is 
6 = le-4. The resulting truncation is shown in Figure 5] (right). It also illustrates 
the kind of insights that can be obtained from the bridging distributions. For 
an overview of the switching dynamics, we look at the expected occupation 
time under the terminal constraint of having entered state (120,0). Letting the 
corresponding hitting time be r = inf{t > 0 | X, = (120,0)}, the expected 
occupation time for some state x is Æ Ch 1=z(X;z) dt | r < 10). We observe that 
in this example the switching behavior seems to be asymmetrical. The main mass 
seems to pass through an area where initially a small number of A molecules is 
produced followed by a total decay of B molecules. 


5.3 Recursive Bayesian Estimation 


We now turn to the method’s application in recursive Bayesian estimation. This 
is the problem of estimating the system’s past, present, and future behavior un- 
der given observations. Thus, the MJP becomes a hidden Markov model (HMM). 
The observations in such models are usually noisy, meaning that we cannot infer 
the system state with certainty. 

This estimation problem entails more general distributional constraints on 
terminal 6(-, T) and initial 7(-,0) distributions than the point mass distributions 
considered up until now. We can easily extend the forward and backward proba- 
bilities to more general initial distributions and terminal distributions $(T). For 
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the forward probabilities we get 


Tand = > Pr(X, = a; | Xo = 25) (2;,0), (15) 
j 
and similarly the backward probabilities are given by 


blit) = X Pr(Xr = 2; | X; = 21) Br (a5). (16) 
J 

We apply our method to an SEIR (susceptible-exposed-infected-removed) model. 
This is widely used to describe the spreading of an epidemic such as the current 
COVID-19 outbreak [23]20]. Temporal snapshots of the epidemic spread are 
mostly only available for a subset of the population and suffer from inaccuracies 
of diagnostic tests. Bayesian estimation can then be used to infer the spreading 
dynamics given uncertain temporal snapshots. 


Model 5 (Epidemics Model). A population of susceptible individuals can 
contract a disease from infected agents. In this case, they are exposed, mean- 
ing they will become infected but cannot yet infect others. After being infected, 
individuals change to the removed state. The mass-action reactions are as fol- 
lows. 

S+IòE+I EI ISR 


The parameter values are X = 0.5, u = 3, p = 3. Due to the stoichiometric 
invariant x + x + x + xe = const., we can eliminate R from the 
system. 


We consider the following scenario: We know that initially (t = 0) one in- 
dividual is infected and the rest is susceptible. At time t = 0.3 all individuals 
are tested for the disease. The test, however, only identifies infected individuals 
with probability 0.99. Moreover, the probability of a false positive is 0.05. We 
like to identify the distribution given both the initial state and the measurement 
at time t = 0.3. In particular, we want to infer the distribution over the latent 
counts of S and E by recursive Bayesian estimation. 

The posterior for n; infected individuals at time t, given measurement Y, = 
ny can be computed using Bayes’ rule 


Pr( XP = ny | Y; = fir) x Pr(Y% = Ay | XO =n) P(X =n). 07 


This problem is an extension of the bridging problem discussed up until now. 
The difference is that the terminal posterior is estimated it using the result of the 
lumped forward equation and the measurement distribution using (17). Based 
on this estimated terminal posterior, we compute the bridging probabilities and 
refine the truncation tailored to the location of the posterior distribution. In Fig- 
ure|6] (left), we illustrate the bridging distribution between the terminal posterior 
and initial distribution. In the context of filtering problems this is commonly re- 
ferred to as smoothing. Using the learned truncation, we can obtain the posterior 
distribution for the number of infected individuals at t = 0.3 (Figure|6](middle)). 
Moreover, can we infer a distribution over the unknown number of susceptible 
and exposed individuals (Figure [6] (right)). 
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Fig. 6. (left) A comparison of the prior dynamics and the posterior smoothing (bridg- 
ing) dynamics. (middle) The prior, likelihood, and posterior of the number of infected 
individuals nz at time t = 0.3 given the measurement ñr = 30. (right) The prior and 
posterior distribution over the latent types E and S. 


6 Conclusion 

The analysis of Markov Jump processes with constraints on the initial and ter- 
minal behavior is an important part of many probabilistic inference tasks such 
as parameter estimation using Bayesian or maximum likelihood estimation, in- 
ference of latent system behavior, the estimation of rare event probabilities, and 
reachability analysis for the verification of temporal properties. If endpoint con- 
straints correspond to atypical system behaviors, standard analysis methods fail 
as they have no strategy to identify those parts of the state-space relevant for 
meeting the terminal constraint. 

Here, we proposed a method that is not based on stochastic sampling and 
statistical estimation but provides a direct numerical approach. It starts with an 
abstract lumped model, which is iteratively refined such that only those parts of 
the model are considered that contribute to the probabilities of interest. In the 
final step of the iteration, we operate at the granularity of the original model 
and compute lower bounds for these bridging probabilities that are rigorous up 
to the error of the numerical integration scheme. 

Our method exploits the population structure of the model, which is present 
in many important application fields of MJPs. Based on experience with other 
work based on truncation, the approach can be expected to scale up to at least 
a few million states [33]. Compared to previous work, our method neither relies 
on approximations of unknown accuracy nor additional information such as a 
suitable change of measure in the case of importance sampling. It only requires 
a truncation threshold and an initial choice for the macro-state sizes. 

In future work, we plan to extend our method to hybrid approaches, in which 
a moment representation is employed for large populations while discrete counts 
are maintained for small populations. Moreover, we will apply our method to 
model checking where constraints are described by some temporal logic [21]. 
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Abstract This paper presents an efficient procedure for multi-objective 
model checking of long-run average reward (aka: mean pay-off) and to- 
tal reward objectives as well as their combination. We consider this for 
Markov automata, a compositional model that captures both traditional 
Markov decision processes (MDPs) as well as a continuous-time variant 
thereof. The crux of our procedure is a generalization of Forejt et al.’s 
approach for total rewards on MDPs to arbitrary combinations of long- 
run and total reward objectives on Markov automata. Experiments with 
a prototypical implementation on top of the STORM model checker show 
encouraging results for both model types and indicate a substantial im- 
proved performance over existing multi-objective long-run MDP model 
checking based on linear programming. 


1 Introduction 


MDP model checking In various applications, multiple decision criteria and un- 
certainty frequently co-occur. Stochastic decision processes for which the ob- 
jective is to achieve multiple—possibly partly conflicting—objectives occur in 
various fields. These include operations research, economics, planning in AI, and 
game theory, to mention a few. This has stimulated model checking of Markov de- 
cision processes (MDPs) [46], a prominent model in decision making under uncer- 
tainty, against multiple objectives. This development enlarges the rich plethora 
of automated MDP verification algorithms against single objectives [7]. 


Multi-objective MDP Various types of objectives known from conventional— 
single-objective—model checking have been lifted to the multi-objective case. 
These objectives range over w-regular specifications including LTL [26,27], ex- 
pected (discounted and non-discounted) total rewards [21,27,28,52,22], step- 
bounded and reward-bounded reachability probabilities [28,35], and—most rel- 
evant for this work—expected long-run average (LRA) rewards [18,11,20], also 
known as mean pay-offs. For the latter, all current approaches build upon lin- 
ear programming (LP) which yields a theoretical time-complexity polynomial in 
the model size. However, in practice, LP-based methods are often outperformed 
by approaches based on value- or strategy iteration [28,1,42]. The LP-based 
approach of [27] and the iterative approach of [28] are both implemented in 
PRISM [45] and STORM [40]. The LP formulation of [11,20] is implemented in 
MULTIGAIN [12], an extension of PRISM for multi-objective LRA rewards. 
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Contributions of this paper We present a computationally efficient procedure for 
multi-objective model checking of LRA reward and total reward objectives as 
well as their mixture. The crux of our procedure is a generalization of Forejt et 
al.’s iterative approach [28] for total rewards on MDPs to expected LRA reward 
objectives. In fact, our approach supports the arbitrary miztures of expected 
LRA and total reward objectives. To our knowledge, such mixtures have not 
been considered so far. Experiments on various benchmarks using a prototypi- 
cal implementation in STORM indicate that this generalized iterative algorithm 
outperforms the LP approach implemented in MULTIGAIN. 

In addition, we extend this approach towards Markov automata (MA) [25,23], 
a continuous-time variant of MDP that is amenable to compositional model- 
ing. This model is well-suited, among others, to provide a formal semantics 
for dynamic fault trees and generalized stochastic Petri nets [24]. Our multi- 
objective LRA approach for MA builds upon the value-iteration approach for 
single-ob jective expected LRA rewards on MA [17] which—on practical models— 
outperforms the LP-based approach of [30]. To the best of our knowledge, this 
is the first multi-objective expected LRA reward approach for MA. Experimental 
results on MA benchmarks show that the treatment of a continuous-time variant 
of LRA comes at almost no time penalty compared to the MDP setting. 


Other related work Mixtures of various other objectives have been considered for 
MDPs. This includes conditional expectations or ratios of reward functions [5,4]. 
[31] considers LTL formulae with probability thresholds while maximizing an 
expected LRA reward. [35,41] address multi-objective quantiles on reachabil- 
ity properties while [50,20] consider multi-objective combinations of percentile 
queries on MDP and LRA objectives. [6] treats resilient systems ensuring con- 
straints on the repair mechanism while maximizing the expected LRA reward 
when being operational. The trade-off between expected LRA rewards and their 
variance is analyzed in [13]. [33] studies multiple objectives on interval MDP, 
where transition probabilities can be specified as intervals in cases where the con- 
crete probabilities are unknown. Multiple LRA reward objectives for stochastic 
games have been treated using LP [19] and value iteration over convex sets [8,9]; 
the latter is included in PRISM-GAMEs [44,43]. These approaches can also be 
applied to MDPs when viewed as one-player stochastic games. Algorithms for 
single-objective model checking of MA deal with objectives such as expected to- 
tal rewards, time-bounded reachability probabilities, and expected long-run av- 
erage rewards [38,29,30,15]. The only multi-objective approach for MA so far [47] 
shows that any method for multi-objective MDP can be applied on (a discretized 
version of) an MA for queries involving unbounded or time-bounded reachability 
probabilities and expected total rewards, but no long-run average rewards. 


2 Preliminaries 


The set of probability distributions over a finite set N is given by Dist(Q) = 
{u: 2+ [0,1] | Moeg uw) = 1}. For a distribution u € Dist(2) we let 
supp(u) = {w E€ Q | u(w) > 0} denote its support. u is Dirac if |supp(w)| = 1. 
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Let R>o = {x € R | x > 0}, R>o = {x € R | x > 0}, and R = RU {—co, co} 
denote the non-negative, positive, and extended real numbers, respectively. For 
a point p = (pı,... pe) E Rf, L € N and i € {1,..., 4} we write p[i] for its it? 
entry p;. For p,q € R* let p-q denote the dot product. We further write p < q iff 
Vi: pli] < qli] and p < q iff p < q^ p £ q. The closure of a set P C R* is the 
union of P and its boundary, denoted by cl(P). The convex hull of P is given by 
conu(P) = a UÙ- pi | u € Dist({1,...,0}),P1,---, Pe €E Ph. The down- 


ward convex hull of P is given by dwconu(P) = {q € R' | 3p € conu(P): q < p}. 


2.1 Markov Automata 


Markov automata (MA) [25,23] provide an expressive formalism that allows one 
to model exponentially distributed delays, nondeterminism, probabilistic branch- 
ing, and instantaneous (undelayed) transitions. 


Definition 1. A Markov Automaton is a tuple M = (S, Act, A, P) where S is a 
finite set of states, Act is a finite set of actions, A: S +> Ry 9JU24 is a transition 
function assigning exit rates to Markovian states MS“ = {s € S | A(s) € Ryo} 
and sets of enabled actions to probabilistic states PS = {s € S | A(s) C Act}, 
and P: MS“ U SA“ —> Dist(S) with SA“ = {(s,a) € PS x Act | a € A(s)} 
is a probability function that assigns a distribution over possible successor states 
for each Markovian state and enabled state-action pair. 


Let M = (S, Act, A, P) be an MA. If M is clear from the context, we may omit 
the superscript from MSM, PSM, SA“, and further notations introduced be- 
low. Intuitively, the time M stays in a Markovian state s € MS is governed by 
an exponential distribution with rate A(s) € Ryo, i.e., the probability to take a 
transition from s within t € R>ọ time units is 1 — e4(s)t. Upon taking a tran- 
sition, a successor state s’ € S is drawn from the distribution P(s), i.e., P(s)(s’) 
is the probability that the transition leads to s’ € S. For probabilistic states 
§ € PS, an enabled action a € A(S) has to be picked and a successor state is 
drawn from P((8,a)) (without any delay). Nondeterminism is thus only possible 
at probabilistic states. We assume deadlock free MA, i.e., Vs € PS™: A(s) 4 0. 


Remark 1. To enable more flexible modeling such as parallel compositions, the 
literature (e.g., [25,30]) often considers a more liberal variant of MA where (i) 
different successor distributions can be assigned to the same state-action pair and 
(ii) states can be both, Markovian and probabilistic. MAs as in Definition 1— 
also known as closed MA—are equally expressive: they can be constructed via 
action renaming and by applying the so-called maximal progress assumption [25]. 


An infinite path in M is a sequence 7 = S9k181K2... where for each i > 0 
either s; E€ MS, Kit. €E R>o, and P(si)(si+1) >0or s; € PS, Ki+1 €E A(s;), 
and P((s;, &i41))(sit1) > 0. Intuitively, if s; is Markovian, Kj41 € R>o reflects 
the time we have stayed in s; until transitioning to s;+,. If s; is probabilistic, 
Kit. E€ Act is the performed action via which we transition to 5;41. A finite path 
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Ñ = S0K151K2 . . : KnSn is a finite prefix of an infinite path m. We set last(7) = sn 
and |z| = n for finite 7 and |r| = oo for infinite m. For (finite or infinite) path 
T = Sok1S1K2... let dur(T) = peur dur(«;) be the total duration of 7 where 
dur(k) = K if x € Ryo and 0 otherwise. If 7 is infinite and dur(7) < oo, the path 
is called Zeno. For k € N with k < |r| we let prefiz,,.,,(7,k) denote the unique 
prefix n’ of 7 with |r| = k and for t € Rso we let prefix ,;,,,.(7,t) denote the 
largest prefix of 7 with total duration at most t. The sets of infinite and finite 
paths of M are given by Paths and Paths% respectively. 

A component of M is a set C C MS U SA. We set states(C) = (CM MS) U 
{se PS | Ja: (s,a) € C}. C is closed if Vc € C: supp(P(c)) C states(C) and 
connected if for all s,s’ € states(C) there is so%1 ... KnSn € Paths, with s = so, 
s’ = sn, and for each i > 0 either s; € CN MS or (si, ki41) E€ CN SA. An end 
component (EC) is a closed and connected component. An EC is maximal if it is 
not a proper subset of another EC. MECS(M) denotes the maximal ECs of M. 
For an EC C let exits(C) = { (s, a) € SA™ | s € states(C) and (s,a) ¢ Ch. 


Definition 2. The sub-MA of M induced by a closed component C is given by 
MI[C] = (states(C), Act, Ac,Pc) where Ac(s) = A(s) if s € CA MS™ and 
otherwise Ac(s) = {a € A(s) | (s,a) € C}, and Po is the restriction of P to ©. 


A strategy for M resolves the nondeterminism at probabilistic states by providing 
probability distributions over enabled actions based on the execution history. 


Definition 3. A (general) strategy for MA M = (S, Act, A, P) is a function 
a: Pathsg, > Dist(Act) U {rt} such that for 7 € Pathsg, we have o(i) € 
Dist(A(last(7))) if last(7) € PS and o(7) = 7 otherwise. 


A strategy v is called memoryless if the choice only depends on the current state, 
ie., V, E Paths,,: last(7) = last(z’) implies o(7) = o(7’). If all assigned 
distributions are Dirac, ø is called deterministic. Let XM and gA denote the 
set of general and memoryless deterministic strategies of M, respectively. For 
simplicity, we often interpret o € XM, as a function o: S > Act U {r}. The in- 
duced sub-MA for o € ©™ is given by M[ MS U {(s,0(s)) | s € PS}]. Strategy 
ao € XM and initial state s; € S define a probability measure pee that assigns 
probabilities to sets of infinite paths [38]. The expected value of f: Paths;,. > R 
is given by the Lebesque integral Ex! (f) = Jre pane f(n) dPr* (x). 


o 


2.2 Reward-based Objectives 


MA can be equipped with rewards to model various quantities like, e.g., energy 
consumption or the number of produced units. We distinguish between transition 
rewards Rirans: MS USA x S > R that are collected when transitioning from 
one state to another and state rewards Restate: S  R that are collected over 
time, i.e., staying in state s for t time units yields a reward of Rstate(s) - t. Since 
no time passes in probabilistic states, state rewards Rstate(s) for s € PS are not 
relevant. A reward assignment combines the two notions. 
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Definition 4. A reward assignment for MA M and Retate; Rtrans as above is 
a function R: (MS x Rso) USA x S > R with 


Restate(s) “A+ Rtrans(S, s') if SE MS, KE Rso 


ee Bence 8!) ia. 


We fix a reward assignment R for M. R can also be applied to any sub-MA 
MIC] of M in a straightforward way. For a component C C MS U SA we 
write R(C) > 0 if all rewards assigned within C are non-negative, formally 
V (s, K) € (CN SA)U ((C N MS) x Rso): Ys’ € states(C): R((C,k) ,s’) > 0. 
The shortcuts R(C) < 0 and R(C) = 0 are similar. The reward of a finite path 
7 = S0K151K2 . . : KnSn is denoted by R(t) = se R((8;—1, Ki); Si). 


Definition 5. The total reward objective for reward assignment R is given by 
tot(R): Pathsing > R with tot(R)(T) = lim süUPkoo R(prefiz steps (T, k)). 


Definition 6. The long-run average (LRA) reward objective for R is given by 
lra(R): Paths,,. > R with lra(R)(7) = lim sup;_,,, + - R(prefi£ time (1, t)). 


Sect. 4 considers assumptions under which the limit in both definitions can 
be attained, i.e., limsup can be replaced by lim. The incorporation of other 
objectives such as reachability probabilities are discussed in Remark 3. 


2.3 Markov Decision Processes 


A Markov Decision Process (MDP) M is an MA with only probabilistic states, 
i.e., MS™ = @. All notions above also apply to MDP. However, since all paths 
of an MDP have duration 0, there is no timing information available. For MDP, 
we therefore usually consider steps instead of time. In particular, for reward as- 
signment R we consider lrasteps(R) instead of Ira(R), where lrasteps(R)(7) = 
lim SUPR_+00 $ -R(prefiz steps(7,k)). Below, we focus on MA. Applying our re- 
sults to step-based LRA rewards on MDPs is straightforward. Time-based LRA 
reward objectives for MA can not straightforwardly be reduced to step-based 
measures for MDP due to the interplay of delayed- and undelayed transitions. 


3 Efficient Multi-objective Model Checking 


We formalize common tasks in multi-objective model checking and sketch our so- 
lution method based on [28]. We fix an MA M = (S, Act, A, P) with initial state 
sy € S and £ > 0 objectives fi,..., fe: Pathsint > R with F = (fi,..., fe). The 


notation for expected values is lifted to tuples: Ex, (F) = (Ex,(fi),..., Ex,(fe)). 


3.1 Multi-objective Model Checking Queries 


Our aim is to maximize the expected value for each (potentially conflicting) 
objective f;. We impose the following assumption which can be asserted using 
single-objective model checking. We further discuss the assumption in Remark 2. 
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tot(R2) 


2 3 
lra(R1) 


(a) MA M with rewards (Ri, R2) (b) Ach(F) (green) and Pareto(F) (blue) 


Figure 1: MA with achievable points and Pareto front for F = (lra(R1), tot(R2)) 


Assumption 1 (Objective Finiteness) Vj: sup {Ex,(f;) |o E X} < oo. 


Definition 7. For F as above, Ach(F) = {pER'| Jo € X: p< Ex,(F)} 
is the set of achievable points. The Pareto front is given by Pareto(F) = 
{p € cl(Ach(F))|Vp! 2 p: p' € el(Ach(F))}. 


A point p € Ach(F) is called achievable and there is a single strategy o that for 
each objective f; achieves an expected value of at least pj]. Due to Assump- 
tion 1, the Pareto front is the frontier of the set of achievable points, meaning 
that it is the smallest set P C R‘ with dwconv(P) = cl(Ach(F)). We can thus 
interpret Pareto(F) as a representation for cl(Ach(F)) and vice versa. The set 
of achievable points is closed iff all points on the Pareto front are achievable. 


Example 1. Fig. 1a shows an MA with initial state s3. Transitions are annotated 
with actions, rates (boldfaced), and successor probabilities. We also depict two 
reward assignments Rı and Rə by labeling states and transitions with tuples 
(r1, r2) where, e.g., Re(s3, a, 51) = —1 and for t € Rso: Ri(se,t, s4) = 6+ t. 

For a; € X pa With 01: 83,54 > a, the EC {s9, (s4, a) , (54, 8) , 86} is reached 
almost surely (with probability 1), yielding Ex,, (/ra(R1)) =0.6-6+0.4-1=4 
and Ex,, (tot(R2)) = S73, —-1- (0.5)! = —2. It follows that the point pı = 
(4, —2) as indicated in Fig. 1b is achievable. Similarly, o2 € X „q with a2: s3 => 
8,84 ++ a achieves the point pg = (3,0). With strategies that randomly pick 
an action at s3, we can also achieve any point on the blue line in Fig. 1b that 
connects pı and py. This line coincides with the Pareto front Pareto(F) for 
F = (lra(R1), tot(R2)). The set of achievable points Ach(F) (indicated in green) 
coincides with the downward convex hull of the Pareto front. 


For multi-objective model checking we are concerned with the following queries: 


MULTI-OBJECTIVE MODEL CHECKING QUERIES 
Qualitative Achievability: Given point p € Rf, decide if p € Ach(F). 


Quantitative Achievability: Given po, p3,...,pe E€ R, compute or approxi- 
mate sup {p ER | (p, p2, P3, s , Pe) € Ach(F)}. 
Pareto: Compute or approximate Pareto(F). 
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Input : MA M with initial state sz, objectives F = (fi,..., fe) 
Output : An approximation of Ach(F) 
P+«@ // Collects achievable points found so far. 
QR // Excludes points that are known to be unachievable. 
repeat 
Select weights w € {w’ € (Rso)° | Sai w’ [Jj] = 1} and € > 0 
Find vw > sup {w ; Ex, (F) | o € X}, ow € X s.t. [uw — w- Ex, (F)| < € 
Compute pw € R* with Vj: pwlj] = Exo, (fi) 
P+ PU{pw}h Q QN{pER |w- p<vw} 
until Approximation dwconv(P) C Ach(F) C Q answers multi-obj. query 


o s1 aan pwnd BE 


Algorithm 1: Approximating the set of achievable points 


3.2 Approximation of Achievable Points 


A practically efficient approach that tackles the above queries for expected total 
rewards in MDP was given in [28]. It is based on so-called sandwich algorithms 
known from convex multi-objective optimization [53,51]. We extend the algo- 
rithm to arbitrary combinations of objectives fj; on MA, including—and this is 
the main algorithmic novelty—mixtures of total- and LRA reward objectives. 
The idea is to iteratively refine an approximation of the set of achievable 
points Ach(F). The refinement loop is outlined in Algorithm 1. At the start of 
each iteration, the algorithm chooses a weight vector w and a precision parameter 
£ after some heuristic (details below). Then, Line 5, considers the weighted sum 
of the expected values of the objectives fj. More precisely, an upper bound vw 
for sup {w- Ex,(F) |o € X} as well as a “near optimal” strategy ow need to 
be found such that the difference between the bound vy and the weighted sum 
induced by oy is at most £. In Sect. 4, we outline the computation of vw and oy 
for the case where F consists of total-and LRA reward objectives. Next, in Line 6 
the algorithm computes a point py that contains the expected values for each 
individual objective fj under strategy oy. These values can be computed using 
off-the-shelf single-objective model checking algorithms on the model induced 
by ow. By definition, pw is achievable. Finally, Line 7 inserts the found point 
into the initially empty set P and excludes points from the set Q (which initially 
contains all points) that are known to be unachievable. The following theorem 
establishes the correctness of the approach. We prove it using Lemmas 1 and 2. 


Theorem 1. Algorithm 1 maintains the invariant dwconu(P) C Ach(F) C Q. 
Lemma 1. Vp € Ach(F),w € (Rso): w- p < sup{w- Ex,(F) |o € X}. 
Proof. Let p € Ach(F) be achieved by strategy op € X. The claim follows from 


£ £ £ 
wp = J whl pli] < A wE Exo) < sup { S whi] Bx, (f;)|o E 2}. 


j=l j=l 


Lemma 2. Ach(F) is convex, i.e., Ach(F) = conv(Ach(F)). 
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Proof. We need to show that for two points p1, p2 € Ach(F) with achieving 
strategies 01,02 € X, any point p on the line connecting pı and pz is also 
achievable. Formally, for w € [0, 1] show that pu = w-pı +(1—w)-p2 E€ Ach(F). 
Consider the strategy o,, that initially makes a coin flipt: With probability w it 
mimics gı and otherwise it mimics o2. We can show for all objectives fj: 


Pwlj] = w- pili] + (~w): poly] < w: Exo, (fj) + (=w): Exo, (fj) = Exo, (fi): 
We now show Theorem 1. A similar proof was given in [28]. 


Proof (of Theorem 1). All pw € P are achievable, i.e., P C Ach(F). By Defini- 
tion 7 and Lemma 2 we get dwconu(P) C dwconu(Ach(F)) = conv(Ach(F)) = 
Ach(F). Now let p € Ach(F) and let w be an arbitrary weight vector consid- 
ered in some iteration of Algorithm 1 with corresponding value vw computed in 
Line 5. Lemma 1 yields w-p < sup {w- Ex, (F) | o € X} < vy and thus p € Q. 


Algorithm 1 can be stopped at any time and the current approximation of 
Ach(F) can be used to (i) decide qualitative achievability, (ii) provide a lower 
and an upper bound for quantitative achievability, and (iii) obtain an approxi- 
mative representation of the Pareto front. 

The precision parameter £ can be decreased dynamically to obtain a gradually 
finer approximation. If Ach(F) is closed, the supremum sup {w- Ex,(F) |o € X} 
can be attained by some strategy ow, allowing us to set € = 0. 

We briefly sketch the selection of weight vectors as proposed in [28]. In the 
first l iterations of Algorithm 1, we optimize each objective fj individually, i.e., 
we consider for all j the weight vector w with w[i] = 0 for i # j and w[j] = 1. 
After that, we consider weight vectors that are orthogonal to a facet of the 
downward convex hull of the current set of points P. To approximate the Pareto 
front, facets with a large distance to Rt \ Q are considered first. To answer a 
qualitative or quantitative achievability query, the selection can be guided further 
based on the input point p € R° or the input values p2,p3,...,pe € R. More 
details and further discussions on these heuristics can be found in [28]. 


Remark 2. Assumption 1 does not exclude Ex,(f;) = —oo which occurs, e.g., 
when objectives reflect resource consumption and some (bad) strategies require 
infinite resources. Moreover, if Assumption 1 is violated for an objective fj 
we observe that for this objective, any (arbitrarily high) value p € R can be 
achieved with some strategy o € X such that p < Ex,(f;). Similar to the proof 
of Lemma 2, a strategy can be constructed that—with a small probability— 
mimics a strategy inducing a very high expected value for fj and—with the 
remaining (high) probability—optimizes for the other objectives. Let F_; be 
the tuple F without f; and similarly for p € Rf let p_; € Rf! be the point p 
without the j™ entry. Assuming inf {Ex,(f;) | o € X} > —oo, we can show that 
cl(Ach(F)) = {p E R° | p_; € cl(Ach(F_;))}. Put differently, cl(Ach(F)) can 
be constructed from the achievable points obtained without the objective fj. 


1 Strategies as in Definition 3 can not “store” the outcome of the initial coin flip. Thus, 
given 7 € Paths,,,, strategy ow actually has to consider the conditional probability 
for the outcome of the coin flip, given that t has been observed. Alternatively, we 
could have also introduced strategies with memory. 
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4 Optimizing Weighted Combinations of Objectives 


We now analyze weighted sums of expected values as in Line 5 of Algorithm 1. 


WEIGHTED SUM OPTIMIZATION PROBLEM 
Input: MA M with initial state sz, objectives F = (f,,..., fe), 


weight vector w € {w’ € (Rso)* | D w’ [3] = 1}, precision £e > 0 
Output: Value vw E R, with vw > sup {w- Ex, (F) |o € X} and 


strategy ow € X such that |vuw —w- Ex, (F)| < €- 


We only consider total- and LRA reward objectives. Remark 3 discusses other 
objectives. We show that instead of a weighted sum of the expected values we can 
consider weighted sums of the rewards. This allows us to combine all objectives 
into a single reward assignment and then apply single-objective model checking. 


4.1 Pure Long-run Average Queries 


Initially, we restrict ourselves to LRA objectives and show a reduction of the 
weighted sum optimization problem to a single-objective long-run average reward 
computation. As usual for MA [38,29,17] we forbid so-called Zeno behavior. 


Assumption 2 (Non-Zenoness) Yo € XM: PrM({r | dur(m) < œ}) = 0. 


The assumption is equivalent to assuming that every EC of M contains at least 
one Markovian state. If the assumption holds, the limit in Definition 6 can be 
attained almost surely (with probability 1) and corresponds to a value v € R. 
Thus, Assumption 1 for LRA objectives is already implied by Assumption 2. Let 
Fira = (lra(R1),..-, lva(Re)) with reward assignments R4. Moreover, for weight 
vector w let Rw be the reward assignment with Ry((s,«) ,s’) = eer w[i] - 
Rj((s, 4) , 8). 

Theorem 2. Vo € X: w- Ex, (Fira) = Ex,(lra(Rw)). 


Proof. Due to Assumption 2 we have for almost all paths m € Paths,,, that for 
all j € {1,...,@} the limit limy_,.. ¢ Rj (prefit time(,t)) exists and 


g £ 
So WIN re (Ry) Cm) = lim $- X wil Ry (Pref aime t) = ira(Rw)(7). 


j=1 
The theorem follows with 
£ 


g 
SD whi] Ex, (ira(R,)) = f 9 wl: ira(Ry) dPr, (7) = Ex, (Ira(Rw)). 


j=1 g 


Due to Theorem 2, it suffices to consider the expected LRA reward for the single 
reward assignment Rw. The supremum sup {Ex, (lra(Rw)) |o € X} is attained 
by some memoryless deterministic strategy ow € Xa [30]. Such a strategy and 
the induced value vw = Ex,, (lra(Rw)) can be computed (or approximated) 
with linear programming [30], strategy iteration [42] or value iteration [17,1]. 
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4.2 A Two-phase Approach for Single-objective LRA 


The computation of single-objective expected LRA rewards for reward assign- 
ment Rw can be divided in two phases [29,17,1]. First, each maximal end compo- 
nent C € MECS(M) is analyzed individually by computing for sub-MA M[C] 
and some? s € states(C) the value vg = max{Ex™ Il} (ira(Ry,)) | o € SMIN, 


Secondly, we consider a quotient model M’ = M\mecs(m) of M that re- 
places the states of each C € MECS(M) by a single state. 


Definition 8. For M = (S, Act, A, P) and a set of ECs C, the quotient is the 
MA Mie = (Sc, Actie, Ac, Pic) where 
— S\c = (S \Ucec states(C)) WCW {s1}, Acte = Act (Ucec exits(C)) W{L}, 


A(8) ffse S 

— Ayc(8) = § exits(8)U{L} if SEC 
1 if §=s,, and 
P(c) if c € MS“ U SAM 


= Pelc) = 4 P((s,a)) ife=(C,(s,a)) for C EC and (s,a) € exits(C) 
{sı => 1} ifeeCx{1}U{s1} 


Intuitively, selecting action L at a state C € MECS(M) in M’ reflects any 
strategy of M that upon visiting the EC C will stay in this EC forever. We 
can thus mimic any strategy of the sub-MA M [C], in particular a memoryless 
deterministic strategy that maximizes the expected value of lra(Rw) in M[C]. 
Contrarily, selecting an action (s,a) at a state C of M’ reflects a strategy of 
M that upon visiting the EC C enforces that the states of C will be left via 
the exiting state-action pair (s,a). Let R* be the reward assignment for M’ 
that yields R*((C,L) ,s1) = vc and 0 in all other cases. It can be shown that 
max{Ex™*! (Ira(Rw)) | o € 5} = max{ Ex"! (tot(R*)) | o € 5}, where 
s% = C_ if sz is contained in some C; € MECS(M) and s‘, = sz otherwise. 

The maximal total reward in M’ can be computed using standard tech- 
niques such as value iteration and policy iteration |46] as well as the more 
recent sound value iteration and optimistic value iteration [48,36]. The lat- 
ter two provide sound precision guarantees for the output value v, i.e., |v — 
max{Ex™M 51 (tot(R*)) |o € XM'}| < for a given £ > 0. 


4.3 Combining Long-run Average and Total Rewards 


We now consider arbitrary combinations of total- and long-run average reward 
objectives F = (tot(R1),..., tot(Rp), lra(Rk+1),---, lra(Re)) with 0 < k < £. 
The above-mentioned procedure for LRA reduces the analysis to an expected 
total reward computation on the quotient model M\ wecs(m). This approach 
suggests to also incorporate other total-reward objectives for M in the quotient 


? The value vc does not depend on the selected state s. Intuitively, this is because 
any other state s’ € states(C) can be reached from s almost surely. 
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model. However, special care has to be taken concerning total rewards collected 
within ECs of M that would no longer be present in the quotient My wecs(m)- 
We discuss how to deal with this issue by considering the quotient only for ECs 
in which no (total) reward is collected. We start with restricting the (total) 
rewards that might be assigned to transitions within EC. 


Assumption 3 (Sign-Consistency) For all total reward objectives tot(R,;) 
either YC € MECS(M): R;(C) > 0 or VC € MECS(M): R;(C) < 0. 


The assumption implies that paths on which infinitely many positive and in- 
finitely many negative reward is collected have probability 0. One consequence 
is that the limit in Definition 5 exists for almost all paths [3]. A discussion on 
objectives tot(R,;) that violate Assumption 3 for single-objective MDP is given 
in [3]. Their multi-objective treatment is left for future work. 

When Assumptions 1 and 3 hold, we get R;(C) < 0 for all objectives tot(R;) 
and EC C. Put differently, all non-zero total rewards collected in an EC have 
to be negative. Strategies that induce a total reward of —oo for some objective 
tot(R;) will not be taken into account for the set of achievable points. Therefore, 
transitions within ECs that yield negative reward should only be taken finitely 
often. These transitions can be disregarded when computing the expected LRA 
rewards, i.e., only the 0-ECs [3] are relevant for the LRA computation. 


Definition 9. A 0-EC of M and R1,..., Rp is an ECC of M with R;i(C) = 0 
for all Ri. The set of maximal 0-ECs is given by MECSo(M, (R1,...,Ri))- 


MECSo(M,(R1,...,Rx)) can be computed by constructing the maximal ECs 
of the sub-MA of M where transitions with a non-zero reward are erased. 

We are ready to describe our approach that combines LRA rewards of 0-ECs 
and the remaining total rewards into a single total-reward objective. Let R12 
and RY? be reward assignments with Rtt ((s, K) ,s’) = em wii] -Ri((s, «) , 8’) 
and RY (is, K), s") = sem wlj] - R;((s, s), 8’). Moreover, for m € Pathsint we 
set (tot(Rt2t) + Ira(R™)) (x) = tot(R%*) (r) + lra(R**)(7). 


Theorem 3. Vo € X: w- Ex,(F) = Ex, (tot(Rt*) + Ira(R™)). 


Proof. Using a similar reasoning as in the proof of Theorem 2, we get: 


w- Ex,(F) = (> wli] Ex, (tot(R:))) + ( D wlj] - Ex, (lra(R;))) 


i=1 j=k+1 
= Ex, (tot(R%*)) + Ex, (Ira(R7)) = Ex, (tot (Rtt) + Ira(R™)). 


Algorithm 2 outlines the procedure for solving the weighted sum optimization 
problem. It first computes optimal LRA rewards and inducing strategies for 
each maximal 0-EC (Lines 1 to 3). Then, a quotient model M* and a reward 
assignment R* incorporating all total- and LRA rewards is build and analyzed 
(Lines 4 to 6). M* might still contain ECs other than {s1 }. Those ECs shall 
be left eventually to avoid collecting infinite negative reward for a total reward 
objective tot(R;). Note that the weight w[i] for such an objective might be zero, 
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Input :MA M with initial state sz, objectives 

F = (tot(R1),..., tot(Rx), ra(Rr+1),.-., lra(Re)), weight vector w 
Output : Value vy, strategy ow as in the weighted sum optimization problem 
C 4 MECSo(M, (Ri,...,Ri)) // Compute maximal 0-ECs and their LRA. 
2 foreach C €C do 


3 Compute vo = max {Ext (ira( RY) | o € St} 


jas 


and inducing strategy oc € sae 


4 M* 4+ Myc // Build and analyze quotient model. 
5 Build reward assignment R* with 
vo if s= 0, = L, ands’ =s, 
R* (ls, K}, s) = $ RY ((8,a),s') ifs =C0,rK = (8,a) € exits(C) 
Riv! ((s,a),s’) otherwise 
6 Compute vw = max { Ex?" (tot(R*)) ce SM PrM"(0{s1}) = 1} 
and inducing strategy o* € IM 
7 Build strategy ow E€ XM with 
CEC: s€ states(C) and o*(CEC)=L 
C EC: s€ states(C) and o*(C) = (s, a) 
acot} (s) if IC EC: s€ states(C) and o*(C) = (s',aœ) for s' Æ s 


a*(s) otherwise 


oc(s) if 
a if 


Ow(s) = 


Algorithm 2: Optimizing the weighted sum for total and LRA objectives 


i.e., the rewards of R; are not present in R*. It is therefore necessary to explicitly 
restrict the analysis to strategies that almost surely (i.e., with probability 1) 
reach s}. To compute the maximal expected total reward in Line 6 with, e.g., 
standard value iteration, we can consider another quotient model for M* and 
the 0-ECs of M* and R*. In contrast to Definition 8, this quotient should not 
introduce the L action since it shall not be possible to remain in an EC forever. 
In Line 7, the strategies for the 0-ECs and for the quotient M* are combined 
into one strategy ow for M. Here, ocos refers to a strategy of MC] for which 
every state s € states(C) eventually reaches s’ € states(C’) almost surely. 

Since Algorithm 2 produces a memoryless deterministic strategy ow, the 
point py € R“ in Line 6 of Algorithm 1 can be computed on the induced sub-MA 
for ow. Assuming exact single-objective solution methods, the resulting value 
Uw and strategy ow € XM, of Algorithm 2 satisfy vw = w- Ex,, (F), yielding 
an exact solution to the weighted sum optimization problem. As the number 
of memoryless deterministic strategies is bounded, we conclude the following, 
extending results for pure LRA queries [11] to mixtures with total rewards. 


Corollary 1. For total- and LRA reward objectives F, Ach(F) is closed and is 


the downward convex hull of at most |SK] = [seps |A(s)| points. 


Remark 3. Our framework can be extended to support objectives beyond total- 
and LRA rewards. Minimizing objectives where one is interested in a strategy o 
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that induces a small expected value can be considered by multiplying all rewards 
with —1. Since we already allow negative values in reward assignments, no fur- 
ther adaptions are necessary. We emphasize that our framework lifts a restriction 
imposed in [28] that disabled a simultaneous analysis of maximizing and mini- 
mizing total reward objectives. Reachability probabilities can be transformed to 
expected total rewards on a modified model in which the information whether a 
goal state has already been visited is stored in the state-space. Goal-bounded to- 
tal rewards as in [30], where no further rewards are collected as soon as one of the 
goal states is reached can be transformed similarly. For MDP, step- and reward- 
bounded reachability probabilities can be converted to total reward objectives 
by unfolding the current amount of steps (or rewards) into the state-space of the 
model. Approaches that avoid such an expensive unfolding have been presented 
in [28] for objectives with step-bounds and in [34,35] for objectives with one or 
multiple reward-bounds. Time-bounded reachability probabilities for MA have 
been considered in [47]. Finally, w-regular specifications such as linear temporal 
logic (LTL) formulae have been transformed to total reward objectives in [27]. 
However, the optimization of LRA rewards within the ECs of the model might 
interfere with the satisfaction of one or more w-regular specifications [31]. 


5 Experimental Evaluation 


Implementation details Our approach has been implemented in the model checker 
STORM [40]. Given an MA or MDP (specified using the PRISM language or 
JANI [14]), the tool answers qualitative- and quantitative achievability as well as 
Pareto queries. Beside of mixtures of total- and LRA reward objectives, STORM 
also supports most of the extensions in Remark 3—with the notable exception of 
LTL. We use LRA value iteration [17,1] and sound value iteration [48] for calls to 
single-objective model checking. Both provide sound precision guarantees, i.e., 
the relative error of these computations is at most £, where we set € = 107°. 


Workstation cluster To showcase the capabilities of our implementation, we 
present a workstation cluster—originally considered in [39] as a CTMC—now 
modeled as an MA. The cluster considers two sub-clusters each consisting of one 
switch and N workstations. Within each sub-cluster the workstations are con- 
nected to the switch in a star topology and the two switches are connected with 
a backbone. Each of the components may fail with a certain rate. A controller can 
(i) acquire additional repair units (up to M) and (ii) control the movements of 
the repair units. In Fig. 2a we depict the resulting sets of achievable points—as 
computed by our implementation—for N = 16 and M = 4. As objectives, we 
considered the long-run average number of operating workstations lra(R op), 
the long-run average probability that at least N workstations are operational 
lra(R#op>n), and the total number of acquired repair units tot(R rep). 


Related tools MULTIGAIN [12] is an extension of PRISM [45] that implements 
the LP-based approach of [11] for multiple LRA objectives on MDP to answer 
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Figure 2: Exemplary results and runtime comparison with MULTIGAIN 


qualitative and quantitative achievability as well as Pareto queries. For the latter, 
it is briefly mentioned in [12] that ideas of [28] were used similar to our approach 
but no further details are provided. MULTIGAIN does not support MA, mixtures 
with total reward objectives, and Pareto queries with £ > 2 objectives. However, 
it does support more general quantitative achievability queries. 
PRISM-GAMES [44,43] implements value iteration over convex sets [8,9] to 
analyze multiple LRA reward objectives on stochastic games (SGs). By convert- 
ing MDPs to 1-player SGs, PRISM-GAMEs could also be applied in our setting. 
However, some experiments on 1-player SGs indicated that this approach is not 
competitive compared to the dedicated MDP implementations in MULTIGAIN 
and STORM. We therefore do not consider PRISM-GAMES in our evaluation. 


Benchmarks We consider 10 different case studies including the workstation 
cluster (clu) as well as benchmarks from QVBS [37] (dpm, rqs, res), from MULTI- 
GAIN [12] (mut, phi, vir), from [42] (csn, sen), and from [47] (pol). For each case 
study we consider 3 concrete instances resulting in 12 MAs and 18 MDPs. The 
analyzed objectives range over LRA rewards, (goal-bounded) total rewards, and 
time-, step- and unbounded reachability probabilities. 


Set-up We evaluated the performance of STORM and MULTIGAIN Version 1.0.2°. 
All experiments were run on 4 cores* of an Intel Xeon Platinum 8160 CPU with 


3 Obtained from http://qav.cs.ox.ac.uk/multigain and invoked with Gurost [32]. 
“ STORM uses one core, MULTIGAIN uses multiple cores due to JAVA’s garbage collec- 
tion and GUROBI’s parallel solving techniques. 
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Table 1: Results for pure LRA Pareto queries 
Model Par. #lra_ |S| |Ms| [Al IC] |Sc|  #iter Storm MutriGain 


csn 3 3 177 427 38 158 9 1.23 

csn 4 4 945 2753 176 880 30 109 

csn 5 5 4833 2-10f 782 4622 TO 

mut 3 2 3-10% 5-104 1 3-104 5 3.7 859 

mut 4 2 7105 1-106 1 7-105 4 91.4 TO 

mut 5 2 1-107 3-107 1-110" 2 3197 MO 

phi 4 2 9440 4-107 1 9440 6 1.7 24.7 

phi 5 2 9-10 4-10° 1 9-104 8 24.5 TO 

phi 6 2 2-108 1-107 1 2-10° 2 1221 MO 

res 5-5 2 2618 8577 I 2618 16 1.64 2.31 

res 15-15 2  2-10° 7-10° 1 2-105 3 712 TO 

res 20-20 2 8-10° 2-108 1 8-105 7 299 TO 

sen 2 3 7855 2-107 3996 6105 3 3.41 

sen 3 3 8-10 310%" 5-10* 7-104 4 274 

sen 4 3. 6-10° 3-10° 4-10° 5-10° TO 

vir 2 2 80 393 2 66 4 21 1.47 

vir 3 2 2-107 2-10° 2 2-104 2 1.3 29.3 

vir 4 2 4-107 7-108 ? ? MO MO 
“du à g3 2 20 L0 40 4&4 2w u 3&7 

clu 16-4 2 210 9-10° 4-10 5 2-10° 10 4199 

clu 32-3 2 210 1-108 5-10 4 2-10° TO 

dpm 3-3 2 2640 1008 3240 1 2640 32 19.5 

dpm 4-4 2 3-104 1-104 4-104 1 3-104 33 1179 

dpm 5-5 2 610° 2-10° 7-105 1 6-105 TO 

pol 3-3 2 9522 4801 2-107 1 9522 17 3.44 

pol 4-3 2  5-10* 3-104 1-10° 1 5-104 19 19.2 

pol 4-4 2  8-10° 5-10° 2-108 1 8-10° 29 3350 

rqs 2-2 2 1619 628 2296 I 1618 63 4.52 

rqs 3-3 2  9-10* 4-104 1-10° 1 9-104 106 162 

rqs 5-3 2 2-108 1-10° 4-108 1 2-10° 97 4345 


a time limit of 2 hours and 32 GB RAM. For each experiment we measured the 
total runtime (including model building) to solve one query. For qualitative and 
quantitative achievability we consider thresholds close to the Pareto front. For 
Pareto queries, the approximation precision 10~4 was set to both tools. 


Results Fig. 2b visualizes the runtime comparison with MULTIGAIN. A point 
(x,y) in the plot corresponds to a query that has been solved by STORM in g 
seconds and by MULTIGAIN in y seconds. Points on the solid diagonal mean 
that both tools were equally fast. The two dotted lines indicate experiments 
where STORM only required b resp. sig of the time of MULTIGAIN. TO and 
MO indicate a time- or memory out. Tables 1 and 2 provide further data for 
Pareto queries. The columns indicate model name and parameters, the number 
of LRA reward, total reward, and bounded reachability objectives, the number of 
states (|S|), Markovian states (|MS |), successor distributions (|A|), 0-ECs (|C]), 
and states within 0-ECs (|Sc|) of the MA or MDP, the number of iterations 
(#fiters) of Algorithm 1 performed by STORM, and the total runtime of STORM 
and MULTIGAIN in seconds. Runtimes are omitted if the tool does not support 
the query. MDP (MA) benchmarks are at the top (bottom) of each table. Table 1 
considers pure LRA queries, whereas Table 2 considers mixtures. 
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Table 2: Results for Pareto queries with other objective types 


Model Par. #lra/tot/bnd |S] |MS| [Al IC] |Sc|  #iter STORM 
res 5-5 2-0-1 2618 8577 1 2618 17 4.27 
res 5-5 2-1-0 2618 8577 1 1705 6 1.43 
res 15-15 2-0-1 2-10° 7-10° i 2-105 4 792 
res 15-15 2-1-0 2-10° 7-105 1 14105 8 1061 
res 20-20 2-0-1 8-105 2-108 1 8-105 8 641 
res 20-20 2-1-0 8-10° 2-10° 1 4105 4 101 
clu 8-3 1-1-0 2-10? 1-10° 4-10° 4-210? 7 163 
clu 16-4 1-1-0 2-10° 9-10° 4-10° 5 2-10° 9 3432 
clu 32-3 1-1-0 2-10 1-106 5-10° 4 2-10° 7 3328 
dpm 3-3 1-0-1 5232 1980 6408 46 3045 2 11.2 
dpm 3-3 1-1-0 4584 1656 5562 25 2856 4 <i 
dpm 4-4 1-0-1 7-104 2-104 8-104 497 4-104 2 214 
dpm 4-4 1-1-0 6-107 2-10* 7-104 301 4-104 4 3.32 
dpm 5-5 1-0-1 1-10® 3-10° 1-10° 6476 6-10° TO 
dpm 5-5 1-1-0 1-10° 3-105 1-10° 4321 6-10° 4 329 
pol 3-3 1-1-0 1-107 5309 2-104 1 9522 3 1.37 
pol 4-3 1-1-0 6-104 3-104 1-10% 1 5-104 3 2.52 
pol 4-4 1-1-0 9-105 5-105 2-108 1 810ë 3 237 
rqs 2-2 1-1-0 2805 1039 4159 1 1618 3 <1 
rqs 3-3 1-1-0 1-10ë 6-104 3-10° 1 9-104 3 4.51 
rqs 5-3 1-1-0 3-10° 2-10° 7-108 1 2-10° 3 182 
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Discussion As indicated in Fig. 2b, our implementation outperforms MULTIGAIN 
on almost all benchmarks and for all types of queries and is often orders of 
magnitude faster. According to MULTIGAIN’s log files, the majority of its runtime 
is spend for solving LPs, suggesting that the better performance of STORM is 
likely due to the iterative approach presented in this work. 

Table 1 shows that pure LRA queries on models with millions of states can 
be handled. There were no significant runtime gaps between MA and MDP mod- 
els. For csn, the increased number of objectives drastically increases the overall 
runtime. This is partly due to our naive implementation of the geometric set 
representations used in Algorithm 1. Table 2 indicates that the performance and 
scalability for mixtures of LRA and other types of objectives is similar. One 
exception are queries involving time-bounded reachability on MA (e.g., dpm). 
Here, our implementation is based on the single-objective approach of [29] that 
is known to be slower than more recent methods [16,15]. 


Data availability The implementation, models, and log files are available at [49]. 


6 Conclusion 


The analysis of multi-objective model checking queries involving multiple long- 
run average rewards can be incorporated into the framework of [28] enabling (i) 
the use of off-the-shelf single-objective algorithms for LRA and (ii) the combina- 
tion with other kinds of objectives such as total rewards. Our experiments indi- 
cate that this approach clearly outperforms existing algorithms based on linear 
programming. Future work includes lifting the approach to partially observable 
MDP and stochastic games, potentially using ideas of [10] and [2], respectively. 
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Abstract. We present a novel modular approach to infer upper bounds 
on the expected runtimes of probabilistic integer programs automatically. 
To this end, it computes bounds on the runtimes of program parts and on 
the sizes of their variables in an alternating way. To evaluate its power, we 
implemented our approach in a new version of our open-source tool KoAT. 


1 Introduction 


There exist several approaches and tools for automatic complexity analysis of 
non-probabilistic programs, e.g., [2-6,8, 9, 18, 20, 21, 27, 28, 30, 34-36, 51, 57, 58]. 
While most of them rely on basic techniques like ranking functions (see, e.g., 
(6, 12-14, 17,53]), they usually combine these basic techniques in sophisticated 
ways. For example, in [18] we developed a modular approach for automated 
complexity analysis of integer programs, based on an alternation between finding 
symbolic runtime bounds for program parts and using them to infer bounds on 
the sizes of variables in such parts. So each analysis step is restricted to a small 
part of the program. The implementation of this approach in KoAT [18] (which is 
integrated in AProVE [30]) is one of the leading tools for complexity analysis [31]. 

While there exist several adaptions of basic techniques like ranking functions 
to probabilistic programs (e.g., [1, 11,15, 16, 22-26, 29, 32, 37, 38, 48, 62]), most 
of the sophisticated full approaches for complexity analysis have not been adapted 
to probabilistic programs yet, and there are only few powerful tools available 
which analyze the runtimes of probabilistic programs automatically [10,50,61, 62]. 

We study probabilistic integer programs (Sect. 2) and define suitable notions of 
non-probabilistic and expected runtime and size bounds (Sect. 3). Then, we adapt 
our modular approach for runtime and size analysis of [18] to probabilistic pro- 
grams (Sect. 4 and 5). So such an adaption is not only possible for basic techniques 
like ranking functions, but also for full approaches for complexity analysis. 

For this adaption, several problems had to be solved. When computing 
expected runtime or size bounds for new program parts, the main difficulty is to 
determine when it is sound to use expected bounds on previous program parts and 
when one has to use non-probabilistic bounds instead. Moreover, the semantics 
of probabilistic programs is significantly different from classical integer programs. 
Thus, the proofs of our techniques differ substantially from the ones in [18], e.g., 


* funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 
- 235950644 (Project GI 274/6-2) & DFG Research Training Group 2236 UnRAVeL 
© The Author(s) 2021 


J. F. Groote and K. G. Larsen (Eds.): TACAS 2021, LNCS 12651, pp. 250-269, 2021. 
https: //doi.org/10.1007/978-3-030-72016-2_14 


Inferring Expected Runtimes Using Expected Sizes 251 


we have to use concepts from measure theory like ranking supermartingales. 

In Sect. 6, we evaluate the implementation of our new approach in the tool 
KoAT [18,43] and compare with related work. We refer to [47] for an appendix 
of our paper containing all proofs, preliminaries from probability and measure 
theory, and an overview on the benchmark collection used in our evaluation. 


2 Probabilistic Integer Programs 


For any set M C R (with R = RU {oo}) and w € M, let Msy = {v € M | 
v > wV v = œ}. Fora set PY of program variables, we first introduce the kind 
of bounds that our approach computes. Similar to [18], our bounds represent 
weakly monotonically increasing functions from PY — Ryo. Such bounds have 
the advantage that they can easily be “composed”, i.e., if f and g are both weakly 
monotonically increasing upper bounds, then so is f og. 


Definition 1 (Bounds). The set of bounds B is the smallest set with PVUR>o 
C B, and where by, b2 E€ B and v E€ Rs, imply bı + b2, bı -b2 E B and v: € B. 


Our notion of probabilistic programs combines classical integer programs (as in, 
e.g., {18]) and probabilistic control flow graphs (see, e.g., [1]). A state s is a 
variable assignment s: V — Z for the (finite) set V of all variables in the program, 
where PY C V, V \ PY is the set of temporary variables, and X is the set of all 
states. For any s € X, the state |s| is defined by |s| (x) = |s(x)| for all z € V. 
The set C of constraints is the smallest set containing e1 < e2 for all polynomials 
e1,e2 € Z[V] and cı A c2 for all c1,c2 € C. In addition to “<”, in examples we 
also use relations like “>”, which can be simulated by constraints (e.g., e1 > ez is 
equivalent to e2 +1 < e; when regarding integers). We also allow the application 
of states to arithmetic expressions e and constraints c. Then the number s(e) resp. 
s(c) € {t,f} results from evaluating the expression resp. the constraint when 
substituting every variable x by s(x). So for bounds b € B, we have |s| (b) € Rso. 

In the transitions of a program, a program variable x € PY can also be updated 
by adding a value according to a bounded distribution function d: X —> Dist(Z). 
Here, for any state s, d(s) is the probability distribution of the values that are 
added to x. As usual, a probability distribution on Z is a mapping pr : Z > R with 
pr(v) € [0,1] for all v € Z and $ „ez pr(v) = 1. Let Dist(Z) be the set of distri- 
butions pr whose expected value E(pr) = „ez v: pr(v) is well defined and finite, 
i.e., Eaps(pr) = J pez |v|-pr(v) < oo. A distribution function d : X + Dist(Z) 
is bounded if there is a finite bound €(d) € B with E,,;,(d(s)) < |s|(€(d)) for 
all s € X. Let D denote the set of all bounded distribution functions (our 
implementation supports Bernoulli, uniform, geometric, hypergeometric, and 
binomial distributions, see [43] for details). 


Definition 2 (PIP). (PV,L,GT, lo) is a probabilistic integer program with 


1. a finite set of program variables PY C V 

2. a finite non-empty set of program locations £ 

3. a finite non-empty set of general transitions GT. A general transition g is a 
finite non-empty set of transitions t = (,p,7,n, L), consisting of 
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p=3 n(z)=x-1 
T= (x >0) ny)=y+e ae 
tı E gi ae _ 
2 . ny) =y-1 
n(x) =x n(x) = x r=(y>0) 
f; n(y)=y f; n(y)=y ae 
0 to € go 1 t3 € 92 E 
t2 € gı 
p=4 n(x) =x 


T=(x>0) ny)=yt+e 


Fig. 1: PIP with non-deterministic and probabilistic branching 


(a) the start and target locations ¢,¢’ € L of transition t, 

(b) the probability p > 0 that transition t is chosen when g is executed, 

(c) the guard T €C oft, and 

(d) the update function 7: PY —> Z[V]U D of t, mapping every program 
variable to an update polynomial or a bounded distribution function. 

Allt € g must have the same start location £ and the same guard T. Thus, 

we call them the start location and guard of g, and denote them by lg and Tg. 

Moreover, the probabilities p of the transitions in g must add up to 1. 

4. an initial location 09 E€ £L, where no transition has target location lo 


PIPs allow for both probabilistic and non-deterministic branching and sam- 
pling. Probabilistic branching is modeled by selecting a transition out of a 
non-singleton general transition. Non-deterministic branching is represented by 
several general transitions with the same start location and non-exclusive guards. 
Probabilistic sampling is realized by update functions that map a program vari- 
able to a bounded distribution function. Non-deterministic sampling is modeled 
by updating a program variable with an expression containing temporary vari- 
ables from V \ PY, whose values are non-deterministic (but can be restricted in 
the guard). The set of initial general transitions GJo C GT consists of all general 
transitions with start location fp. 


Example 3 (PIP). Consider the PIP in Fig. 1 with initial location fo) and the 
program variables PV = {x,y}. Here, let p = 1 and T = t if not stated ex- 
plicitly. There are four general transitions: go = {to}, gi = {t1, t2}, g2 = {ts}, 
and g3 = {t4}, where gı and gp represent a non-deterministic branching. When 
choosing the general transition gı, the transitions tı and tz encode a probabilistic 
branching. If we modified the update ņ and the guard T of to to n(x) = u € V\ PV 
and T = (u > 0), then x would be updated to a non-deterministically chosen 
positive value. In contrast, if n(x) = GEO(3), then tg would update x by adding 
a value sampled from the geometric distribution with parameter Z. 


In the following, we regard a fixed PIP P as in Def. 2. A configuration is a tuple 
(£, t, s), with the current location ¢ € £, the current state s € X, and the transition 
t that was evaluated last and led to the current configuration. Let T =U gegr 9: 
Then Conf = (LW {1 }) x (TW {tin, t1 }) x X is the set of all configurations, with a 
special location £; indicating the termination of a run, and special transitions tin 
(used in the first configuration of a run) and t, (for the configurations of the run 
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after termination). The (virtual) general transition g} = {t1 } only contains t}. 

A run of a PIP is an infinite sequence V = cg c,--- € Conf”. Let Runs = Conf” 
and let FPath = Conf* be the set of all finite paths of configurations. 

In our setting, deterministic Markovian schedulers suffice to resolve all non- 
determinism (see, e.g., [54, Prop. 6.2.1]). For c = (¢,t, s) € Conf, such a scheduler 
G yields a pair G(c) = (g, s’) where g is the next general transition to be taken 
(with £ = ,) and s’ chooses values for the temporary variables where s’(7,) = t 
and s(x) = s'(x) for all x € PV. If GT contains no such g, we get G(c) = (91,8). 

For each scheduler G and initial state so, we first define a probability mass 
function pre sọ. For all c € Conf, pre,s,(c) is the probability that a run starts in 
c. Thus, pre,s (c) = 1 if c = (lo, tin, s0) and preso (c) = 0, otherwise. Moreover, 
for all c',c € Conf, preso (c > c) is the probability that the configuration c’ is 
followed by the configuration c (see [47] for the formal definition of pre ,s,)- 

For any f = co-++Cn € FPath, let pr6.s.(f) = pr6,so(Co) : PTS ,s9(Co > C1) 
. -° P16 ,so(Cn—1 + Cn). We say that f is admissible for G and so if pres (f) > 0. 
A run ù is admissible if all its finite prefixes are admissible. A configuration 
c € Conf is admissible if there is some admissible finite path ending in c. 

The semantics of PIPs can now be defined by giving a corresponding probabil- 
ity space, which is obtained by a standard cylinder construction (see, e.g., [7,60]). 
Let Pe,s denote the corresponding probability measure which lifts pre,sọ to 
cylinder sets: For any f € FPath, we have pres (f) = P6,s,(Prer) for the set 
Pre, of all runs with prefix f. So Pes (O) is the probability that a run from 
© C Runs is obtained when using the scheduler G and starting in sọ. 

We denote the associated expected value operator by Ee... So for any random 
variable X : Runs + N = NU {oo}, we have E6,s,(X) = new n Pes (X =n). 
For details on the preliminaries from probability theory we refer to [47]. 


3 Complexity Bounds 


In Sect. 3.1, we first recapitulate the concepts of (non-probabilistic) runtime and 
size bounds from [18]. Then we introduce expected runtime and size bounds in 
Sect. 3.2 and connect them to their non-probabilistic counterparts. 


3.1 Runtime and Size Bounds 


Again, let P denote the PIP which we want to analyze. Def. 4 recapitulates the 
notions of runtime and size bounds from [18] in our setting. Recall that bounds 
from 6 do not contain temporary variables, i.e., we always try to infer bounds in 
terms of the initial values of the program variables. Let sup @ = 0, as all occurring 
sets are subsets of Rs, whose minimal element is 0. 


Definition 4 (Runtime and Size Bounds [18]). RB: T —> B is a runtime 
bound and SB: T x V > B is a size bound if for all transitions t € T, all 
variables x € V, all schedulers G, and all states sọ E X, we have 


lsol(RBZ)) > sup{ |{i| ti = th] | f = (ntos -) tt (tn: -) A pre.so(f) > OF, 
|so| (SB(t, x)) > sup { |s(z)| | f=-+-(4t,8) A pre.so(f) > OF. 
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So RB(t) is a bound on the number of executions of t and SB(t,x) over- 
approximates the greatest absolute value that x € V takes after the application 
of the transition t in any admissible finite path. Note that Def. 4 does not apply 
to tin and t,, since they are not contained in 7. 

We call a tuple (RB, SB) a (non-probabilistic) bound pair. We will use such 
non-probabilistic bound pairs for an initialization of expected bounds (Thm. 10) 
and to compute improved expected runtime and size bounds in Sect. 4 and 5. 


Example 5 (Bound Pair). The technique of [18] computes the following bound 
pair for the PIP of Fig. 1 (by ignoring the probabilities of the transitions). 


x; ift € {to, t1, t2} 
Pie: 2 SB(t,2) = 
ene (62) A if t € {tz, ta} 
REQ=Ae, ift=t | 
Ow, ift = tə or t = t4 SB(t, y) = Y, ift= to 
OO, ift € {ti, t2, ts, ta} 


Clearly, to and tz can only be evaluated once. Since tı decrements x and no 
transition increments it, tı’s runtime is bounded by |so| (x). However, tz can 
be executed arbitrarily often if so(x) > 0. Thus, the runtimes of tz and t4 are 
unbounded (i.e., P is not terminating when regarding it as a non-probabilistic 
program). SB(t,x) is finite for all transitions t, since x is never increased. In 
contrast, the value of y can be arbitrarily large after all transitions but to. 


3.2 Expected Runtime and Size Bounds 
We now define the expected runtime and size complexity of a PIP P. 
Definition 6 (Expected Runtime Complexity, PAST [15]). For g € GT, 
its runtime is the random variable R(g) where R: GT —> Runs > N with 

R(g)( (to, -) (4ta,-) +++) = [te] te € go} I. 


For a scheduler © and so € X, the expected runtime complexity of g € GT is 

16,s0(R(g)) and the expected runtime complexity of P is X jcgr Es,so (R(9)). 
If P’s expected runtime complexity is finite for every scheduler G and every 

initial state so, then P is called positively almost surely terminating (PAST). 


So R(g)(v) is the number of executions of a transition from g in the run V. 

While non-probabilistic size bounds refer to pairs (t, x) of transitions t € 7 and 
variables x € V (so-called result variables in [18]), we now introduce expected size 
bounds for general result variables (g, £, x), which consist of a general transition 
g, one of its target locations £, and a program variable x € PV. So x must not 
be a temporary variable (which represents non-probabilistic non-determinism), 
since general result variables are used for expected size bounds. 


Definition 7 (Expected Size Complexity). The set of general result vari- 
ables is GRV = {(9,0,2) | g E€ GT,x € PYV, (-,-,-,-,£) © g}. The size ofa = 
(g,£, £) E GRY is the random variable S(a) where S: GRV — Runs > N with 


S(g, L, x) ( (Lo, to, 80) (41, 41,81) 2) = sup{|si(x)| | L =lAt Eg}. 
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For a scheduler © and so, the expected size complexity of aE GRY is Es s,(S(a)). 


So for any run V, S(g, 2,x)(V) is the greatest absolute value of x in location £, 
whenever £ was entered with a transition from g. We now define bounds for the 
expected runtime and size complexity which hold independent of the scheduler. 
Definition 8 (Expected Runtime and Size Bounds). 

e RBg: GT — B is an expected runtime bound if for all g € GT, all schedulers 

G, and all so E€ X, we have |so| (RBz(g)) > Eo,s,(R(g)). 

e SBg: GRY — B is an expected size bound if for alla € GRY, all schedulers 

G, and all so E€ X, we have |so| (SBz(a)) > Eos) (S(a)). 

e A pair (RBz,SBz) is called an expected bound pair. 


Example 9 (Expected Runtime and Size Bounds). Our new techniques from 
Sect. 4 and 5 will derive the following expected bounds for the PIP from Fig. 1. 


1, if gE {90,92} z, ifg=go 
RBzg(g) = 4 2-2, ifg=g, SBr(g,-,c) =< 2-2, ifg=q 
6-22+2-y, ifg = 43 3-2, if gE€{92, 93} 
SBu(go,41,y) =Y SBu(g2,l2,y) =6-27+2-y 
SBu(n,f1,y) =6- 2? +y SBz(g3,l2,y) = 12-27 +4-y 


While the runtimes of tə and t4 were unbounded in the non-probabilistic case 
(Ex. 5), we obtain finite bounds on the expected runtimes of gı = {t1,t2} and 
g3 = {t4}. For example, we can expect x to be non-positive after at most |so| (2-«) 
iterations of g,. Based on the above expected runtime bounds, the expected 
runtime complexity of the PIP is at most |so|(RBe(go) +... + RBE(g3)) = 
|so| (2+2-x2+2-y+6- x°), ie., it is in O(n?) where n is the maximal absolute 
value of the program variables at the start of the program. 


The following theorem shows that non-probabilistic bounds can be lifted to 
expected bounds, since they do not only bound the expected value of R(g) resp. 
S(a), but the whole distribution. As mentioned, all proofs can be found in [47]. 


Theorem 10 (Lifting Bounds). For a bound pair (RB,SB), (RBg, SBz) 
with RBu(g) = direg RB(t) and SBu(g,l,t) = Via, weg SBE, x) is an 
expected bound pair. 


Here, we over-approximate the maximum of SB(t, x) for t = (_,_,_,-,£) € g by 
their sum. For asymptotic bounds, this does not affect precision, since max( f, g) 
and f +g have the same asymptotic growth for any non-negative functions f, g. 


Example 11 (Lifting of Bounds). When lifting the bound pair of Ex. 5 to expected 
bounds according to Thm. 10, one would obtain RBg(go) = RBg(g2) = 1 and 
RBz(gi1) = RBe(g3) = œ. Moreover, SBg(g0, 01,2) = z, SBg(g1, 41,2) = 2-2, 
SBg(g2, l2, £) = SB (gs, l2, £) = 3 - x, SBelgo, 41, y) = y, and SBg(g, -,y) = 00 
whenever g # go. Thus, with these lifted bounds one cannot show that P’s 
expected runtime complexity is finite, i.e., they are substantially less precise than 
the finite expected bounds from Ex. 9. Our approach will compute such finite 
expected bounds by repeatedly improving the lifted bounds of Thm. 10. 
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4 Computing Expected Runtime Bounds 


We first present a new variant of probabilistic linear ranking functions in Sect. 4.1. 
Based on this, in Sect. 4.2 we introduce our modular technique to infer expected 
runtime bounds by using expected size bounds. 


4.1 Probabilistic Linear Ranking Functions 


For probabilistic programs, several techniques based on ranking supermartingales 
have been developed. In this section, we define a class of probabilistic ranking 
functions that will be suitable for our modular analysis. 

We restrict ourselves to ranking functions t: L > R[PV];n that map every 
location to a linear polynomial (i.e., of at most degree 1) without temporary 
variables. The linearity restriction is common to ease the automated inference of 
ranking functions. Moreover, this restriction will be needed for the soundness of 
our technique. Nevertheless, our approach of course also infers non-linear expected 
runtimes (by combining the linear bounds obtained for different program parts). 

Let exp,,,,, denote the expected value of t after an execution of g € GT in 
state s € X. Here, s,(a) is the expected value of x € PV after performing the 
update 7 in state s. So if n(x) € D, then x’s expected value after the update 
results from adding the expected value of the probability distribution n(x)(s): 


EXD g s = os lY with s(x) = 4 967): if n(x) € ZV] 
Pr,g,s es „(e(l )) th n( ) Loe U(n(x)(s)), if n(x) ED 


Definition 12 (PLRF). Let GTS C Gini C GT. Then v: L > RIPYV],,, is a 
probabilistic linear ranking function (PLRF) for G7s and Gla; if for all g € 
Glai \ GIs and d € Conf there is a ger E {<,>} such that for all finite 
paths --- c'c that are admissible for some © and so E€ X, and where c = (£,t, s) 
(i.e., where t is the transition that is used in the step from œ to c), we have: 
Boundedness (a): Ift € g for ag € GIni \ GIs, then s(t(£)) Pg, 0. 
Boundedness (b): Ift € g for ag € GIs, then s(r(£)) > 0. 

Non-Increase: If (=, for ag € GIni and s(tg) = t, then s(t(C)) > exp, gs- 
Decrease: If = L4 for ag € GIs and s(Tg) = t, then s(t(@)) —1> exp, , 


So if one is restricted to the sub-program with the non-increasing transitions 
GTi, then t(@) is an upper bound on the expected number of applications of tran- 
sitions from G7s when starting in £. Hence, a PLRF for G7S = Ghai = GT would 
imply that the program is PAST (see, e.g., [1, 16,24, 25]). However, our PLRF's 
differ from the standard notion of probabilistic ranking functions by considering 
arbitrary subsets GJni C G7. This is needed for the modularity of our approach 
which allows us to analyze program parts separately (e.g., GT \G7n; is ignored when 
inferring a PLRF). Thus, our “Boundedness” conditions differ slightly from the 
corresponding conditions in other definitions. Condition (b) requires that g E€ GTS 
never leads to a configuration where t is negative. Condition (a) states that in 
an admissible path where g = {t1, t2, ...} € Gni \ OTs is used for continuing in 
configuration c’, if executing tı in c’ makes t negative, then executing to must 
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make t negative as well. Thus, such a g can never come before a general transition 
from G7s in an admissible path and hence, g can be ignored when inferring upper 
bounds on the runtime. This increases the power of our approach and it allows 
us to consider only non-negative random variables in our correctness proofs. 

We use SMT solvers to generate PLRFs automatically. Then for “Bounded- 
ness”, we regard all s € X with s'(7,) = t and require “Boundedness” for any 
state s that is reachable from s’. 


Example 13 (PLRFs). Consider again the PIP in Fig. 1 and the sets GT, = 
Glai = {91} and GTL = GT;, = {93}, which correspond to its two loops. 

The function t with t(¢:) = 2.x and t(l) = t(&2) = 0 is a PLRF for 
GTs = GTi: For every admissible configuration (¢,t,s) with t € gı we have £ = 4 
and s(t(¢,)) = 2- s(x) > 0, since x was positive before (due to gı’s guard) and it 
was either decreased by 1 or not changed by the update of tı resp. tg. Hence t is 
bounded. Moreover, for sı(x) = s(x — 1) = s(x) — 1 and s2(x) = s(x) we have: 


eXPrg,s = 3° SIEA) + 9° s2(t(21)) = 2-s() -1 = s((4))- 1 


So t is decreasing on gı and as GTs = Gni, also the non-increase property holds. 
Similarly, v with v (2) = y and v (lo) = v (4) = 0 is a PLRF for GIL = GTi. 


In our implementation, G7s is always a singleton and we let Grn; C GT be a 
cycle in the call graph where we find a PLRF for GT C Gai. The next subsection 
shows how we can then obtain an expected runtime bound for the overall program 
by searching for suitable ranking functions repeatedly. 


4.2 Inferring Expected Runtime Bounds 


Our approach to infer expected runtime bounds is based on an underlying (non- 
probabilistic) bound pair (RB, SB) which is computed by existing techniques (in 
our implementation, we use [18]). To do so, we abstract the PIP to a standard 
integer transition system by ignoring the probabilities of transitions and replacing 
probabilistic with non-deterministic sampling (e.g., the update n(x) = GEO($) 
would be replaced by n(x) = x +u with u € V \ PY, where u > 0 is added to the 
guard). Of course, we usually have RB(t) = oo for some transitions t. 

We start with the expected bound pair (RBg,SBg) that is obtained by 
lifting (RB, SB) as in Thm. 10. Afterwards, the expected runtime bound RBzg is 
improved repeatedly by applying the following Thm. 16 (and similarly, SBg is 
improved repeatedly by applying Thm. 23 and 25 from Sect. 5). Our approach 
alternates the improvement of REg and SBg, and it uses expected size bounds 
on “previous” transitions to improve expected runtime bounds, and vice versa. 

To improve RBg, we generate a PLRF t for a part of the program. To obtain 
a bound for the full program from rt, one has to determine which transitions can 
enter the program part and from which locations it can be entered. 


Definition 14 (Entry Locations and Transitions). For Gni C GT and £L € 
L, the entry transitions are €7gz,,(€) = {9 € GT \ Gini | St g.t = (O 
Then the entry locations are all start locations of GIy; whose entry transitions 
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are not empty, ie., Ler, = {£ | ETern() £ BA (Cra) € UT} 


Example 15 (Entry Locations and Transitions). For the PIP from Fig. 1 and 
GTa = {91}, we have ELgr, = {4} and ETg7,,(41) = {go}. So the loop formed 


by gı is entered at location ¢,; and the general transition go has to be executed 
before. Similarly, for GT, = {93} we have ELgr, = {42} and ETgr: (l2) = {92}. 


Recall that if r is a PLRF for G7S C Gni, then in a program that is restricted 
to Gni, t(¢) is an upper bound on the expected number of executions of transitions 
from G7 when starting in £. Since t(@) may contain negative coefficients, it is 
not weakly monotonically increasing in general. To turn expressions e € R[PY] 
into bounds from B, let the over-approximation [-] replace all coefficients by 
their absolute value. So for example, [x — y| = [x + (—1) -y| = x + y. Clearly, 
we have |s|([e]) > |s| (e) for all s € X. Moreover, if e € R[PY] then fe] € B. 

To turn [r(£)] into a bound for the full program, one has to take into account 
how often the sub-program with the transitions Gn; is reached via an entry 
transition h € ETgr,; (£) for some £ € ELgr;. This can be over-approximated 
by } =en RB(t), which is an upper bound on the number of times that 
transitions in h to the entry location £ of Gni are applied in a full program run. 

The bound [t(@)] is expressed in terms of the program variables at the entry 
location £ of G7,;. To obtain a bound in terms of the variables at the start of the 
program, one has to take into account which value a program variable x may have 
when the sub-program G7y; is reached. For every entry transition h € E7gr,,(4), 
this value can be over-approximated by SBg(h, 4, x). Thus, we have to instan- 
tiate each variable x in [t(¢)] by SBg(h,é,x7). Let SBg(h,é,-) : PV > B be 
the mapping with SBg(h, ¢,-)(~) = SBg(h, £, x). Hence, SBg(h, £,-)([t(2)]) over- 
approximates the expected number of applications of G7 if G7y; is entered in loca- 
tion £, where this bound is expressed in terms of the input variables of the program. 
Here, weak monotonic increase of [t(£)] ensures that instantiating its variables by 
an over-approximation of their size yields an over-approximation of the runtime. 


Theorem 16 (Expected Runtime Bounds). Let (RBg, SBzg) be an expected 


bound pair, RB a (non-probabilistic) runtime bound, and t a PLRF for GTs C 
Glai CGT. Then RBr: GT — B is an expected runtime bound where 


ede, ce pen RBM) Sel.) (HO), 9 € Ts 
RBz(g) = neeTon te) t=(4,-.4.40€ 
RBs (g), — 


Example 17 (Expected Runtime Bounds). For the PIP from Fig. 1, our approach 
starts with (RBg, SBg) from Ex. 11 which results from lifting the bound pair from 
Ex. 5. To improve the bound RBg(g1) = œ, we use the PLRF t for GIs = Ghi = 
{gi} from Ex. 13. By Ex. 15, we have ELgrz,, = {4} and ET¢7,,(41) = {go} with 
go = {to}, whose runtime bound is RB(to) = 1, see Ex. 5. Using the expected 
size bound SBg(go, 41,2) = x from Ex. 9, Thm. 16 yields 


RB: (g1) = RB(to) -SB (go, £1, -) ([e(4)]) =] 2- ea 2+ 2. 
1 For a set of sets like Gni, U GTi denotes their union, i.e., U Gni = U 


IEGTai F 
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To improve RBg(g3), we use the PLRF v for GTL = GT, = {g3} from Ex. 13. As 
ELer, = {l2} and ETgr: (l2) = {92} by Ex. 15, where go = {t3} and RB(t3) = 1 
(Ex. 5), with the bound SBz(g2, 2, y) = 6-2? +2- y from Ex. 9, Thm. 16 yields 


RBz(93) = RB(ts) - SBu (go, l2, -) (fe (€2)]) = 1- SBelg2, l2,y) = 6-47 +2-y. 


So based on the expected size bounds of Ex. 9, we have shown how to compute 
the expected runtime bounds of Ex. 9 automatically. 


Similar to [18], our approach relies on combining bounds that one has com- 
puted earlier in order to derive new bounds. Here, bounds may be combined 
linearly, bounds may be multiplied, and bounds may even be substituted into 
other bounds. But in contrast to [18], sometimes one may combine expected 
bounds that were computed earlier and sometimes it is only sound to combine 
non-probabilistic bounds: If a new bound is computed by linear combinations of 
earlier bounds, then it is sound to use the “expected versions” of these earlier 
bounds. However, if two bounds are multiplied, then it is in general not sound to 
use their “expected versions”. Thus, it would be unsound to use the expected run- 
time bounds RBg(h) instead of the non-probabilistic bounds }7,—(____ yen RBE) 
on the entry transitions in Thm. 16 (a counterexample is given in [47]).? 

In general, if bounds 6;,...,6, are substituted into another bound b, then it 
is sound to use “expected versions” of the bounds b1,...,bn if b is concave, see, 
e.g., [10, 11,40]. Since bounds from 6 do not contain negative coefficients, we 
obtain that a finite? bound b € B is concave iff it is a linear polynomial (see [47]). 

Thus, in Thm. 16 we may substitute expected size bounds SBg(h, £, x) into 
[t(2)], since we restricted ourselves to linear ranking functions t and hence, [t(@) | 
is also linear. Note that in contrast to [11], where a notion of concavity was used 
to analyze probabilistic term rewriting, a multilinear expression like x- y is not 
concave when regarding both arguments simultaneously. Hence, it is unsound to 
use such ranking functions in Thm. 16. See [47] for a counterexample to show 
why substituting expected bounds into a non-linear bound is incorrect in general. 


5 Computing Expected Size Bounds 


We first compute local bounds for one application of a transition (Sect. 5.1). 
To turn them into global bounds, we encode the data flow of a PIP in a graph. 
Sect. 5.2 then presents our technique to compute expected size bounds. 


5.1 Local Change Bounds and General Result Variable Graph 


We first compute a bound on the expected change of a variable during an 
update. More precisely, for every general result variable (g,¢,7) we define a 
bound CBg(g, £, x) on the change of the variable x that we can expect in one 


2 An exception is the special case where t(l) is constant. Then, our implementation 
indeed uses the expected bound RBg(h) instead of `,- ay en RBE) [47]. 

3 A bound is finite if it does not contain oo. We always simplify expressions and thus, 
a bound like 0- œœ is also finite, because it simplifies to 0, as usual in measure theory. 
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execution of the general transition g when reaching location £. So we consider 
all t = (p,n, £) € g and the expected difference between the current value 
of x and its update n(x). However, for n(x) € Z[V], n(x) — x is not necessarily 
from 6 because it may contain negative coefficients. Thus, we use the over- 
approximation [7(a) — x] (where we always simplify expressions before applying 
[-], eg., [e—a] = [0] = 0). Moreover, [n(x)— x| may contain temporary 
variables. Let tv; : VY — B instantiate all temporary variables by the largest 
possible value they can have after evaluating the transition t. Hence, we then use 
tvz({n(x) — x]) instead. For tv;, we have to use the underlying non-probabilistic 
size bound SB for the program (since the scheduler determines the values of 
temporary variables by non-deterministic (non-probabilistic) choice). If x is 
updated according to a bounded distribution function d € D, then as in Sect. 2, 
let E(d) € B denote a finite bound on d, i.e., Eans(d(s)) < |s| (€(d)) for all s € X. 


Definition 18 (Expected Local Change Bound). Let SB be a size bound. 


Then CBg: GRV > B with CBg(g, 4, x£) = 5 p-chi(n(x), x), where 
t=(-P,-,£)Eg 
_ Jed), ifn(z)=deD _ J SBit,y), ify é PV 
chy(n(z), £) = (Htc) — x|), otherwise and tve(y) = Y, ify E PV 


Example 19 (CBg). For the PIP of Fig. 1, we have CBg(go, -, -) = CBr (92, -,-) = 
CBr(gs, l2, x£) = 0, since the respective updates are identities. Moreover, 


CBr(m, 41,2) = §-[(@-1)-2] + 5-[e2—-a2] = 5-14 4-0 = 4. 
In a similar way, we obtain CBg(qi, 41, y) = x and CBg(gs, 2, y) = 1. 


The following theorem shows that for any admissible configuration in a state 
s', CBg(g, £, x) is an upper bound on the expected value of |s(a) — s' (x)| if s is 
the next state obtained when applying g in state s’ to reach location £. 


Theorem 20 (Soundness of CBg). For any (g,£,2) € GRY, scheduler ©, 
so € X, and admissible configuration c = (_,_, 8’), we have 


Is'|(CBe(9.82)) > Y 


To obtain global bounds from the local bounds CBg(g, £, £), we construct a 
general result variable graph which encodes the data flow between variables. Let 
pre(g) = ETa (lg) be the the set of pre-transitions of g which lead into g’s start 
location lg. Moreover, for a = (g, b,x) € GRY let its active variables actV (a) 
consist of all variables occurring in the bound x + CBg(a) for a’s expected size. 


, 1, 
c=(L,t,s)€Conf, teg PYG, 50 (C E c) |s(s) ğ (x)|. 


Definition 21 (General Result Variable Graph). The general result vari- 
able graph has the set of nodes GRV and the set of edges GRVE, where 


GRVE = 4 ((g', L, x), (g, L £)) | g" € pre(g) A L = £L A2' € actV (g, £, 2) }. 


Example 22 (General Result Variable Graph). The general result variable 
graph for the PIP of Fig. 1 is shown below. For CBg from Ex. 19, we have 
actV (g1, 41,2) = {x}, as x + CBg(a) = x + $ contains no variable except x. 
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Similarly, actV(g1, 41, y) = {x,y}, as x and 
y are contained in y+CBg(g1, 1, y) = yt+a. For 
all other a € GRY, we have act V(_,_,x) = {a} 


(go, £1, 2) (go, £1, y) 


N 


and actV(_,-,y) = {y}. As pre(g1) = {90; 91}, (C 

the graph captures the dependence of (g1, £1, £) ts x 

on (go, £1, £) and (gi, 41, £), and of (gı, l1, y) on | (92, £2, 2) C| (g1, £1, y) 

(go, 41,2), (90,41, 9); (91,41, 2), and (91,41, y)- | \ 

The other edges are obtained in a similar way. | (3, 42, x) D (g2, l2, y) 
l 

5.2 Inferring Expected Size Bounds C (gs, £2, y) 


We now compute global expected size bounds for the general result variables by 
considering the SCCs of the general result variable graph separately. As usual, 
an SCC is a maximal subgraph with a path from each node to every other node. 
An SCC is trivial if it consists of a single node without an edge to itself. We first 
handle trivial SCCs in Sect. 5.2.1 and consider non-trivial SCCs in Sect. 5.2.2. 


5.2.1 Inferring Expected Size Bounds for Trivial SCCs By Thm. 20, 
x +CBeg(g, £, x) is a local bound on the expected value of x after applying g once 
in order to enter l. However, this bound is formulated in terms of the values of 
the variables immediately before applying g. We now want to compute global 
bounds in terms of the initial values of the variables at the start of the program. 

If g is initial (i.e., g € GJo since g starts in the initial location f)), then 
x +CBg(g, £, x) is already a global bound, as the values of the variables before 
the application of g are the initial values of the variables at the program start. 

Otherwise, the variables y occurring in the local bound z + CBg(g, £, x) have 
to be replaced by the values that they can take in a full program run before 
applying the transition g. Thus, we have to consider all transitions h € pre(g) 
and instantiate every variable y by the maximum of the values that y can have 
after applying h. Here, we again over-approximate the maximum by the sum. 

If CBg(g, £, x) is concave (i.e., a linear polynomial), then we can instantiate 
its variables by expected size bounds SBg(h, lg, y). However, this is unsound if 
CBg(g, £, x) is not linear, i.e., not concave (see [47] for a counterexample). So in 
this case, we have to use non-probabilistic bounds SB(t, y) instead. 

As in Sect. 4.2, we use an underlying non-probabilistic bound pair (RB, SB) 
and start with the expected pair (RBg, SBg) obtained by lifting (RB, SB) ac- 
cording to Thm. 10. While Thm. 16 improves RBg, we now improve SBg. Here, 
the SCCs of the general result variable graph should be treated in topological 
order, since then one may first improve SBg for result variables corresponding to 
pre(g), and use that when improving SBg for result variables of the form (g, -, -). 


Theorem 23 (Expected Size Bounds for Trivial SCCs). Let SBg be an 
expected size bound, SB a (non-probabilistic) size bound, and let a = (4g, é, x) 
form a trivial SCC of the general result variable graph. Let sizeg and size“ be 
mappings from PV > B with sizes (y) = > inepre(g) SBe(h, lg, y) and size” (y) = 
rene fot jh Eh SB(t,y). Then SBz : GRV — B is an expected size bound, 
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where SB (8) = SBg(8) for 8B £a and 


x +CBg(a), if g € Glo 
SBgla) = 4 sizeg (x + CBg(a)), if g  GTo, CBg(a) is linear 


sizeg (x) + size“(CBg(a)), if g € GTo, CBg(a) is not linear 


Example 24 (SBg for Trivial SCCs). The general result variable graph in Ex. 22 
contains 4 trivial SCCs formed by a, = (go, 41, £), &y = (go, 41, Y), Bx = (92, l2, £), 
and By = (g2, l2,y). For all these general result variables, the expected local 
change bound CBg is 0 (see Ex. 19). Thus, it is linear. Since go € GTo, Thm. 23 
yields SBy(az) = x +CBR(a,) = x and SBglay) = y +CBg(ay) = y. 

By treating SCCs in topological order, when handling z, By, we can assume 
that we already have SBg(a,) = x, SBg(ay) = y and SBg(gi,41,2) = 2-2, 
SBelgi, l, y) = 6-2? +y (see Ex. 9) for the result variables corresponding to 
pre(g2) = {90,91}. We will explain in Sect. 5.2.2 how to compute such expected 
size bounds for non-trivial SCCs. Hence, by Thm. 23 we obtain SBz(b,) = 
size®" (x + CBz(Bx)) = SBe(az) +SBp(q1, 61,0) = 3-2 and SBE(By) = sizes’ (y+ 
CBe(by)) = SBr(ay) + SBelgi, 41, y) = 6-27 +2-y. 


5.2.2 Inferring Expected Size Bounds for Non-Trivial SCCs Now we 
handle non-trivial SCCs C of the general result variable graph. An upper bound 
for the expected size of a variable x when entering C is obtained from SBg(£8) 
for all general result variables 8 = (_,_,2) which have an edge to C. 

To turn CBg(g, 4, x) into a global bound, as in Thm. 23 its variables y have 
to be instantiated by the values size(9-%*) (y) that they can take in a full program 
run before applying a transition from g. Thus, size”) (CBu(q, £,x)) is a global 
bound on the expected change resulting from one application of g. To obtain 
an upper bound for the whole SCC C, we add up these global bounds for all 
(g,-,x) € C and take into account how often the general transitions in the SCC 
are expected to be executed, i.e., we multiply with their expected runtime bound 
RBg(g). So while in Thm. 16 we improve RBg using expected size bounds for 
previous transitions, we now improve SBg(C) using expected runtime bounds 
for the transitions in C and expected size bounds for previous transitions. 


Theorem 25 (Expected Size Bounds for Non-Trivial SCCs). Let (RBg, 
SBg) be an expected bound pair, (RB, SB) a (non-probabilistic) bound pair, and 
let C C GRV form a non-trivial SCC of the general result variable graph where 
GIc = {g E GT | (g,-,-) € C}. Then SB is an expected size bound: 


2 (Ba)EGRVE, BEC, aEC, B=(-,-,2) SBr(8) + 


SBi(a)= 2 È RBelg)-( E size” (CBg(o’))), ifa=(,52) EC 
E gEGTC a'=(g,„x)EC 
SBg(a), otherwise 


Here we really have to use the non-probabilistic size bound size” instead of 
size , even if CBg(a’) is linear, i.e., concave. Otherwise we would multiply the 
expected values of two random variables which are not independent. 


Example 26 (SBg for Non-Trivial SCCs). The general result variable graph in 
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Ex. 22 contains 4 non-trivial SCCs formed by al, = (g1, 41,2), a, = (91,41,9), 
Bi, = (93, 42,x), and 6} = (gs, l2,y). By the results on SBg, RBu, CBr, and SB 
from Ex. 24, 17, 19, and 5, Thm. 25 yields the expected size bound in Ex. 9: 


SBi(al,) = SBz(ax) + RBe(g1) - size (CBe(a,)) =24+2-a-}=2-a 

SBy(a4,) = SBu(ay) + RBe(g1) -size (CBg(al,)) = y + 2- x: size™y (x) 

=y +2: 2 J ic{o,1,2} SBlti, 2) =6- £? +y 

SBi,(8',) = SBu (Bz) + RBz(g3) - size”! (CBg(6’,)) =3- x + (62? +2y)-0=3-x 

SBz(8,) = SBu (By) + RBu (gs) size» (CBg(G4)) =6- £? +2-y + (6x? + 2y)-1 
=12-¢7+4-y 


6 Related Work, Implementation, and Conclusion 


Related Work Our approach adapts techniques from [18] to probabilistic programs. 
As explained in Sect. 1, this adaption is not at all trivial (see our proofs in [47]). 

There has been a lot of work on proving PAST and inferring bounds on 
expected runtimes using supermartingales, e.g., |1, 11, 15, 16, 22-25, 29, 32, 48, 62]. 
While these techniques infer one (lexicographic) ranking supermartingale to 
analyze the complete program, our approach deals with information flow between 
different program parts and analyzes them separately. 

There is also work on modular analysis of almost sure termination (AST) 
(1, 25, 26,37, 38, 48], i.e., termination with probability 1. This differs from our 
results, since AST is compositional, in contrast to PAST (see, e.g., [41,42]). 

A fundamentally different approach to ranking supermartingales (i.e., to 
forward-reasoning) is backward-reasoning by so-called expectation transformers, 
see, e.g., [10,41, 42, 44-46, 50,52, 61]. In this orthogonal reasoning, [10, 41, 42,52] 
consider the connection of the expected runtime and size. While expectation 
transformers apply backward- instead of forward-reasoning, their correctness can 
also be justified using supermartingales. More precisely, Park induction for upper 
bounds on the expected runtime via expectation transformers essentially ensures 
that a certain stochastic process is a supermartingale (see [33] for details). 

To the best of our knowledge, the only available tools for the inference of upper 
bounds on the expected runtimes of probabilistic programs are [10,50, 61, 62]. 
The tool of [61] deals with data types and higher order functions in probabilistic 
ML programs and does not support programs whose complexity depends on 
(possibly negative) integers (see [55]). Furthermore, the tool of [48] focuses on 
proving or refuting (P)AST of probabilistic programs for so-called Prob-solvable 
loops, which do not allow for nested or sequential loops or non-determinism. So 
both [61] and [48] are orthogonal to our work. We discuss [10, 50,62] below. 
Implementation We implemented our analysis in a new version of our tool 
KoAT [18]. KoAT is an open-source tool written in OCaml, which can also be 
downloaded as a Docker image and accessed via a web interface [43]. 

Given a PIP, the analysis proceeds as in Alg. 1. The preprocessing in Line 1 
adds invariants to guards (using APRON [39] to generate (non-probabilistic) 
invariants), unfolds transitions [19], and removes unreachable locations, transitions 
with probability 0, and transitions with unsatisfiable guards (using Z3 [49]). 
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Input: PIP (PV, L, GT, £o) 
1 preprocess the PIP 
(RB, SB) + perform non-probabilistic analysis using [18] 
(RBz, SB) + lift (RB, SB) to an expected bound pair with Thm. 10 
repeat 
for all SCCs C of the general result variable graph in topological order do 
if C = {a} is trivial then SB; + improve SBy for C by Thm. 23 
else SB; + improve SBg for C by Thm. 25 
for all a € C do SBr(a) + min{SBz(a), SBz(a)} 
for all general transitions g € G7 do 
RBz + improve RBg for GIs = {g} by Thm. 16 
11 RBu(g) — min{RBe(g9), RBa(g)} 
12 until no bound is improved anymore 
Output: ”` egr RBu(9) 


Algorithm 1: Overall approach to infer bounds on expected runtimes 


oonan AOUN 


m 
° 


We start by a non-probabilistic analysis and lift the resulting bounds to an 
initial expected bound pair (Lines 2 and 3). Afterwards, we first try to improve 
the expected size bounds using Thm. 23 and 25, and then we attempt to improve 
the expected runtime bounds using Thm. 16 (if we find a PLRF using Z3). To 
determine the “minimum” of the previous and the new bound, we use a heuristic 
which compares polynomial bounds by their degree. While we over-approximated 
the maximum of expressions by their sum to ease readability in this paper, KoAT 
also uses bounds containing “min” and “max” to increase precision. 

This alternating modular computation of expected size and runtime bounds is 
repeated so that one can benefit from improved expected runtime bounds when 
computing expected size bounds and vice versa. We abort this improvement of 
expected bounds in Alg. 1 if they are all finite (or when reaching a timeout). 

To assess the power of our approach, we performed an experimental evaluation 
of our implementation in KoAT. We did not compare with the tool of [62], since [62] 
expects the program to be annotated with already computed invariants. But for 
many of the examples in our experiments, the invariant generation tool [56] used 
by [62] did not find invariants strong enough to enable a meaningful analysis (and 
we could not apply APRON [39] due to the different semantics of invariants). 

Instead, we compare KoAT with the tools Absynth [50] and eco-imp [10] which 
are both based on a conceptionally different backward-reasoning approach. We ran 
the tools on all 39 examples from Absynth’s evaluation in [50] (except recursive, 
which contains non-tail-recursion and thus cannot be encoded as a PIP), and on 
the 8 additional examples from the artifact of [50]. Moreover, our collection has 
29 additional benchmarks: 14 examples that illustrate different aspects of PIPs, 5 
PIPs based on examples from [50] where we removed assumptions, and 10 PIPs 
based on benchmarks from the TPDB [59] where some transitions were enriched 
with probabilistic behavior. The TPDB is a collection of typical programs used in 
the annual Termination and Complexity Competition [31]. We ran the experiments 
on an iMac with an Intel i5-2500S CPU and 12GB of RAM under macOS Sierra 
for Absynth and NixOS 20.03 for KoAT and eco-imp. A timeout of 5 minutes per 
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Bound |KoAT |Absynth|eco-imp Bound |KoAT |Absynth|eco-imp 
O(1) 6 6 6 O(1) 2 1 2 
O(n) | 32 32 29 O(n) 10 3 6 
O(n?) ] 3 8 9 O(n?) | 12 1 6 
O(n>")| 0 0 0 O(n>")| 2 0 0 
EXP 0 0 0 EXP 1 0 0 
ore) 5 0 2 o0 2 15 12 
TO 0 0 0 TO 0 9 3 

Fig. 2: Results on benchmarks from [50] Fig. 3: Results on our new benchmarks 


example was applied for all tools. The average runtime of successful runs was 
4.26s for KoAT, 3.53s for Absynth, and just 0.93s for eco-imp. 

Fig. 2 and 3 show the generated asymptotic bounds, where n is the maximal 
absolute value of the program variables at the program start. Here, “oo” indicates 
that no finite time bound could be computed and “TO” means “timeout”. The 
detailed asymptotic results of all tools on all examples can be found in [43, 47]. 

Absynth and eco-imp slightly outperform KoAT on the examples from Absynth’s 
collection, while KoAT is considerably stronger than both tools on the additional 
benchmarks. In particular, Absynth and eco-imp outperform our approach on 
examples with nested probabilistic loops. While our modular approach can 
analyze inner loops separately when searching for probabilistic ranking functions, 
Thm. 16 then requires non-probabilistic time bounds for all transitions entering 
the inner loop. But these bounds may be infinite if the outer loop has probabilistic 
behavior itself. Moreover, in contrast to our work and [10], the approach of [50] 
does not require weakly monotonic bounds. 

On the other hand, KoAT is superior to Absynth and eco-imp on large exam- 
ples with many loops, where only a few transitions have probabilistic behavior 
(this might correspond to the typical application of randomization in practical 
programming). Here, we benefit from the modularity of our approach which treats 
loops independently and combines their bounds afterwards. Absynth and eco-imp 
also fail for our leading example of Fig. 1, while KoAT infers a quadratic bound. 
Hence, the tools have particular strengths on orthogonal kinds of examples. 

KoAT’s source code is available at https://github.com/aprove-developers/ 
KoAT2-Releases/tree/probabilistic. To obtain a KoAT artifact, see https:// 
aprove-developers.github.io/ExpectedUpperBounds/ for a static binary and Dock- 
er image. This web site also provides all examples from our evaluation, detailed 
outputs of our experiments, and a web interface to run KoAT directly online. 


Conclusion We presented a new modular approach to infer upper bounds on the 
expected runtimes of probabilistic integer programs. To this end, non-probabilistic 
and expected runtime and size bounds on parts of the program are computed in 
an alternating fashion and then combined to an overall expected runtime bound. 
In the evaluation, our tool KoAT succeeded on 91% of all examples, while the 
main other related tools (Absynth and eco-imp) only inferred finite bounds for 
68% resp. 77% of the examples. In future work, it would be interesting to consider 
a modular combination of these tools (resp. of their underlying approaches). 


Acknowledgements We thank Carsten Fuhs for discussions on initial ideas. 
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Abstract. Software developers frequently check their code changes by 
running a set of tests against their code. Tests that can nondeterministi- 
cally pass or fail when run on the same code version are called flaky tests. 
These tests are a major problem because they can mislead developers to 
debug their recent code changes when the failures are unrelated to these 
changes. One prominent category of flaky tests is order-dependent (OD) 
tests, which can deterministically pass or fail depending on the order in 
which the set of tests are run. By detecting OD tests in advance, de- 
velopers can fix these tests before they change their code. Due to the 
high cost required to explore all possible orders (n! permutations for n 
tests), prior work has developed tools that randomize orders to detect 
OD tests. Experiments have shown that randomization can detect many 
OD tests, and that most OD tests depend on just one other test to fail. 
However, there was no analysis of the probability that randomized or- 
ders detect OD tests. In this paper, we present the first such analysis and 
also present a simple change for sampling random test orders to increase 
the probability. We finally present a novel algorithm to systematically 
explore all consecutive pairs of tests, guaranteeing to detect all OD tests 
that depend on one other test, while running substantially fewer orders 
and tests than simply running all test pairs. 


Keywords: Flaky tests - Order dependent - Test-pair coverage 


1 Introduction 


The most common way that developers check their software is through frequent 
regression testing performed while they develop software. Developers run regres- 
sion tests to check that recent code changes do not break existing functionality. 
A major problem for regression testing is flaky tests [27], which can nondeter- 
ministically pass or fail when run on the same code version. The failures from 
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these tests can mislead developers to debug their recent changes while the fail- 
ures can be due to a variety of reasons unrelated to the changes. Many software 
organizations have reported flaky tests as one of their biggest problems in soft- 
ware development, including Apple [18], Facebook [5,10], Google [8,30,31,43,48], 
Huawei [16], Microsoft [11,12,20,21], and Mozilla [40]. 


These flaky tests are among the tests, called test suite, that developers run 
during regression testing; a test suite is most often specified as a set, not a 
sequence, of tests. Having a test suite as a set provides benefits for regression 
testing techniques such as selection, prioritization, and parallelization [23,45]. 
The test execution platform can choose to run these tests in various test orders. 
For example, for projects using Java, the most popular testing framework is 
JUnit [17], and the most popular build system is Maven [28]. Tests in JUnit 
are organized in a set of test classes, each of which has a set of test methods. 
By default, Maven runs tests using the Surefire plugin [29], which does not 
guarantee any order of test classes or test methods. However, the use of Surefire 
and JUnit does not interleave the test methods from different test classes in a 
test order. The same structure is common for many other testing frameworks 
such as TestNG [41], Cucumber [4], and Spock [38]. 


One prominent category of flaky tests is deterministic order-dependent (OD) 
tests [22,24,32,47], which can deterministically pass or fail in various test orders, 
with at least one order in which these tests pass and at least one other order 
in which they fail. Other flaky tests are non-deterministic (ND) tests, which 
are flaky due to reasons other than solely the test order [24]; for at least one 
test order, these tests can nondeterministically pass or fail even in that same 
test order. Our iDFlakies work [22] has released the iDFlakies dataset [15] of 
flaky tests in open-source Java projects. We obtained this dataset by running 
test suites many times in randomized test orders, collecting test failures, and 
classifying failed tests as OD or ND flaky tests. In total, 50.5% of the dataset 
are OD tests, while the remaining 49.5% are ND tests. 


Prior research has proposed multiple tools [2,6,9,14,22,47] to detect OD tests. 
Some of the tools [9,14] search for potential OD tests and may therefore report 
false alarms, i.e., tests that cannot fail in the current test suite (but may fail in 
some extended test suite). The other tools [2,6,22,47] detect OD tests that actu- 
ally fail by running multiple randomized orders of the test suite. Running tests in 
random orders is also available in many testing platforms, e.g., Surefire for Java 
has a mode to randomize the order of test classes, pytest [35] for Python has 
the --random-order option, and rspec [36] for Ruby has the --order random 
option. While these tools can detect many OD tests, the tools run random orders 
and hence can miss running test orders in which OD tests would fail. The listed 
prior work has not studied the flake rates, i.e., the probability that an OD test 
would fail when run in (uniformly) sampled test orders. 

Our iFixFlakies work [37] has studied the causes of failures for OD tests. We 
find that the vast majority of OD tests are related to pairs of tests, i.e., each 
OD test would pass or fail due to the sharing of some global state with just one 
other test. Our iFixFlakies work has also defined multiple kinds of tests related 
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to OD tests. Each OD test belongs to one of two kinds: (1) brittle, which is a 
test that fails when run by itself but passes in a test order where the test is 
preceded by a state-setter; and (2) victim, which is a test that passes when run 
by itself but fails in a test order where the test is preceded by a (state-) polluter 
unless a (state-) cleaner runs in between the polluter and the victim. Most of the 
work in this paper focuses on victim tests because most OD tests are victims 
rather than brittles (e.g., 91% of the truly OD tests in the iDFlakies dataset are 
victims [15]), and the analysis for brittles often follows as a simple special case 
of the analysis for victims. 


This paper makes the following two main contributions. 


Probability Analysis. We develop a methodology to analytically obtain the 
flake rates of OD tests and propose a simple change to the random sampling 
of test orders to increase the probability of detecting OD tests. A flake rate is 
defined as the ratio of the number of test orders in which an OD test fails divided 
by the total number of orders. Flake rates can help researchers analytically com- 
pare various algorithms (e.g., comparing reversing a passing order to sampling a 
random order as shown in Section 4.4) and help practitioners prioritize the fixing 
of flaky tests. Specifically, we study the following problem: determine the flake 
rate for a given victim test with its set of polluters and a set of cleaners for each 
polluter. We first derive simple formulas with two main assumptions: (A1) all 
polluters have the same set of cleaners and (A2) all of the victim, polluters, and 
cleaners are in the same test class. We then derive formulas that keep A1 but 
relax A2. Our results on 249 real flaky tests show that our formulas are appli- 
cable to 236 tests (i.e., only 13 tests violate A1). To relax both assumptions, we 
propose an approach to estimate the flake rate without running test orders. Our 
analysis finds that some OD tests have a rather low flake rate, as low as 1.2%. 


Systematic Test-Pair Exploration. Because random sampling of test orders 
may miss test orders in which OD tests fail, we propose a systematic approach to 
cover all consecutive test pairs to detect OD tests. We present an algorithm that 
systematically explores all consecutive test pairs, guaranteeing the detection of 
all OD tests that depend on one other test, while running substantially fewer 
tests than a naive exploration that runs every pair by itself. Our algorithm builds 
on the concept of Tuscan squares [7], studied in the field of combinatorics. Given 
a test suite, the algorithm generates a set of test orders, each consisting of at least 
two distinct tests and at most all of the tests from the test suite, that cover all 
of the consecutive test pairs, while trying to minimize the cost of running those 
test orders. The algorithm can cover pairs of tests from the same and different 
classes, while considering only the test orders that do not interleave tests from 
different test classes, being a common constraint of testing frameworks such as 
JUnit [17]. Our analysis shows that the algorithm runs substantially fewer tests 
than naive exploration. To experiment with the new algorithm based on Tuscan 
squares, we run some of the test orders generated by the algorithm for some of 
the test suites in the iDFlakies dataset. Our experiments detect 44 new OD tests, 
not detected in prior work [22,24,25], and we have added the newly detected tests 
to the Illinois Dataset of Flaky Tests [19]. 


Probabilistic and Systematic Coverage of Test Pairs 273 


public void testMRAppMasterSuccessLock() { // testV for short 
. // setup MapReduce job, e.g., set conf and userName 
MRAppMaster appMaster = 
new MRAppMasterTest ("appattempt_...", "container_...", "host", -1, 
-1, System.currentTimeMillis(), false, false); 
try { 
MRAppMaster.initAndStartAppMaster(appMaster, conf, userName); 
} catch (IOException e) { ... } 
. // assert the state and some properties of appMaster 
appMaster.stop(); 
} 
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Be 


Fig. 1. Victim OD test from Hadoop’s TestMRAppMaster class. 


public void testSigTermedFunctionality() { // testP for short 
JHEventHandlerForSigtermTest jheh = 
new JHEventHandlerForSigtermTest (Mockito.mock (AppContext.class), 0); 
jheh.addToFileMap (Mockito.mock (JobId.class) ) ; 
. // have jheh handle a few events 
jheh.stop(); 
... // assert whether the events were handled properly 
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Fig. 2. Polluter test from Hadoop’s Test JobHistoryEventHandler class. 


2 Background and Example 


We use an example to introduce some key concepts for OD tests and to illustrate 
challenges in debugging these tests. We represent a test order as a sequence 
of tests (t1,t2,...,t;). In Java, each test order is executed by a Java Virtual 
Machine (JVM) that starts from the initial state (e.g., all shared pointer variables 
initialized to nu11) and then runs each test, which potentially modifies the shared 
state. Each test is run at most once in one JVM run. (Thus, covering test orders 
and test pairs has to be done with a set of test orders and cannot be done with 
just one very long order, e.g., using superpermutations [13].) A test v is a victim 
if it passes in the order (v) but fails in another order; the other order usually 
contains a single polluter test p (besides many other tests) such that v fails even 
in the order (p,v). Moreover, the test suite may contain a cleaner test c such 
that v passes in the order (p,c,v). Note that test orders may contain more tests 
besides polluters and cleaners for a victim v, but these other tests do not modify 
the relevant state and do not affect whether v passes or not in any order. Precise 
definitions for these tests are in our previous work [37]. 

Figure 1 shows a snippet of a victim test, testMRAppMasterSuccessLock 
(in short testv), from the widely used Hadoop project [1]. The test suite for 
this test has 392 tests. This test is from the MapReduce (MR) framework and 
aims to check an MR application. This test is a victim because it passes when 
run by itself but has two polluter tests. If the victim is run after either one of its 
polluter tests (and no cleaner runs in between the polluter and the victim), then 
the victim fails with a NullPointerException. Figure 2 shows a snippet of one 
of these two polluter tests, test SigTermedFunctionality (in short testP). 

These tests form a polluter-victim pair because they share a global state, 
namely all “active” jobs stored in a static map in the JobHistoryEventHandler 
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class. (In JUnit 4, only the heap state reachable from the class fields declared 
as static is shared across tests; JUnit does not automatically reset that state, 
but developers can add setup and teardown methods to reset the state.) To 
check an MR application, testv first sets up some state (Line 2), then cre- 
ates an MR application (Line 3), and starts the application (Line 7). The 
NullPointerException arises when the test tries to stop the MR application 
(Line 10). Specifically, the appMaster accesses the shared map data structure 
that tracks all jobs run by any application. When testV is run after testP, then 
appMaster will attempt to stop a job created by the polluter, although the job 
has already been stopped. 


This static map is empty when the JVM starts running a test order, and 
it is also explicitly cleared by some tests. In fact, we find 11 cleaner tests 
that clear the map, and the victim passes when any one of these 11 tests 
is run between testP and testv. Interestingly, for the other polluter test, 
testTimelineEventHandling (in short testP’), the victim fails for the same 
reason, but testP’ has 31 cleaners—the same 11 as testP and 20 other cleaners. 
Our manual inspection finds that the testP’ polluter has other cleaners because 
the job created by testP’ is named job_200_0001, while the job created by the 
testP polluter is a mock object. The 20 other cleaners also create and stop 
jobs named job_200.0001 and therefore act as cleaners for the testP’ polluter 
but not the testP polluter. This example illustrates not only how victims and 
polluters work but also the complexity in how these tests interact with cleaners. 


In Section 4.2, we explore how to compute the flake rate for a victim test, i.e., 
the probability that the test fails in a randomly sampled test order of all tests 
in the test suite. For this example, the 392 tests could, in theory, be run in 392! 
(~ 10848) test orders (permutations), but in practice, JUnit never interleaves 
test methods from different test classes. These tests are split into 48 classes 
that actually have ~ 10734 test orders that JUnit could run. The relevant 34 
tests (1 victim, 2 polluters, and 31 cleaners) belong to 8 test classes: 2 polluters 
belong to one class (Test JobHistoryEventHandler), 11 cleaners belong to the 
same class as the polluters, 1 cleaner belongs to the same class as the victim 
(TestMRAppMaster), and the remaining 19 cleaners belong to six other classes. 
For this victim, randomly sampling the orders that JUnit could run gives a 
flake rate of 4.5%. In Section 4.4, we propose a simple change to increase the 
probability of detecting OD tests by running a reverse of each passing test order. 
For this victim, the conditional probability that the reverse order fails is 4.9%. 


A commonly asked question is whether all detected OD tests should be fixed. 
While ideally all flaky tests should be fixed, some are not fixed [21,23]. For the 
majority of OD tests, fixing them is good to prevent flaky-test failures that 
can mislead the developers into debugging the wrong parts of the code; also, 
fixing OD tests enables tests to be run in any order, which then enables the use 
of beneficial regression-testing techniques [23]. Some OD tests are intentionally 
run in specific orders (e.g., using the @FixMethodOrder annotation in JUnit) to 
speed up testing by reusing states. We have submitted fixes for a large number 
of flaky tests in our prior work [19]. 
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We next formalize the concepts that we have introduced informally and define 
some new concepts. Let T = {t1,t2,..., tn} be a set of n tests partitioned in k 
classes C = {C1, C2,...,Cx}. We use class(t) to denote the class of test t. Each 
class C; has n; = |{t € T | class(t) = C;}| tests. 

We use w(T’) to denote a test order, i.e., a permutation of tests in T’ C T, 
and drop T’ when clear from the context. We use w; to denote the i-th test 
in the test order w, and |w| to denote the length of a test order as measured 
by the number of tests. We use t <,, t’ to denote that test t is before t in 
the test order w. We will analyze some cases that allow all n! permutations, 
potentially interleaving tests from different classes. We use 2.4(T) to denote the 
set of all test orders for T. Some testing tools [47] explore all these test orders, 
potentially generating false alarms because most testing frameworks [4,17,38,41] 
do not allow all these test orders. 

We are primarily concerned with class-compatible test orders where all tests 
from each class are consecutive, i.e., if class(w;) = class(w;), then for all j 
with i < j < 7’, class(w;) = class(w;). We use Qc¢(T) to denote the set of all 
class-compatible test orders for T. The number of such class-compatible test 
orders is KITTS, n;!. Section 4.2 presents how to compute the flake rate, i.e., 
the percentage of test orders in which a given victim test (with its polluters and 
cleaners) fails. 

Section 5 presents how to systematically generate test orders to ensure that 
all test pairs are covered. A test pair (t, t’) consists of two distinct tests t Æ t’. We 
say that a test order w covers a test pair (t, t’), in notation cover(w, (t, t’)), iff the 
two tests are consecutive in w, i.e., w = (...,t,t’,...). Considering consecutive 
tests is important because a victim may not fail if not run right after a polluter, 
i.e., when a cleaner is run between the polluter and the victim. A set of test 
orders §2 covers the union of test pairs covered by each test order w € 92. In 
general, test orders in a set can be of different lengths. Each test order w covers 
|w| — 1 test pairs. 

We distinguish intra-class test pairs, where class(t) = class(t’), and inter- 
class test pairs, where class(t) 4 class(t’). Of the total n(n — 1) test pairs, each 
class C; has n;(n; — 1) intra-class test pairs, and the number of inter-class test 
pairs is 2 Xici <j<k iNj. Each class-compatible test order of all T tests covers 
ni — 1 intra-class test pairs for each class C; and k — 1 inter-class test pairs. 

We aim to generate a set of test orders 2 that cover all test pairs. If we 
consider 4(T) that allows all test orders, we need at least n test orders to 
cover all n(n — 1) test pairs. When we have only one class or all classes have only 
one test, then all test orders are class-compatible. However, consider the more 
common case when we have more than one class and some class has more than 
one test. If we consider Qc¢(T) that allows only class-compatible test orders, 
we need at least max*_, n; test orders to cover all intra-class test pairs and at 


3 This problem should not be confused with pairwise testing [33], which typically aims 
to cover pairs of values from different test parameters. 
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least M = 2) <i<j<p miny/(k — 1) test orders to cover all inter-class test pairs; 


because M > max*_,n;, we need at least M class-compatible test orders to 
cover all test pairs. 

More precisely, we aim to generate a set of test orders (2 that has the lowest 
cost for test execution. The cost for each test order w can be modeled well as 
a sum of a fixed cost Costo (e.g., corresponding to the time required to start a 
JVM and load required classes) and a cost for each test (e.g., the time to execute 
the test method): Cost(w) = Costo + ewu Cost(t). The cost for a set of test 
orders is then simply the sum of individual costs Cost(2) = X „eg Cost(w). For 
example, a trivial way to cover all test pairs is with a set of test orders where 
each test order is just a test pair: R, = {(t,t’) |t, t E€ TAt Æt}; however, the 
cost is unnecessarily high: Cost(Q,) = n(n — 1)Costo + 2(n — 1)Cost(T’), where 
Cost(T) = Xer Cost(t). 

To simplify, we can assume that each test in T has the same cost, say, Cost, 
and then Cost(2,) = n(n — 1)Costo + 2n(n — 1)Cost,. In the optimal case, each 
test order would be a permutation of n tests covering n — 1 test pairs, and the 
number of test orders would be just n(n — 1)/(n — 1) = n. Therefore, the lowest 
cost is Cost(Qopr) = nCosto + n*Cost;, demonstrating that the factor for Costo 
can be substantially reduced, while the factor for Cost, is nearly halved (a): 
However, in most realistic cases, due to the constraints of class-compatible test 
orders and the big differences in the number of tests across different classes, we 
cannot reach the optimal case. 


3.1 Dataset for Evaluation 


Besides deriving some analytical results, we also run some empirical experiments 
on flaky tests from Java projects. Our recent work [25] ran the iDFlakies tool 
on most test suites in the projects from the iDFlakies dataset [15] using the 
configurations recommended by our iDFlakies work [22]. Specifically, we ran 100 
randomly sampled test orders from Q¢(T) and 1 test order that is the reverse 
order of what Maven Surefire [29] runs by default. Note that unlike our work in 
Section 4.4, where we propose running a reverse test order of every test order 
where all tests passed, the one reverse order that we ran in our recent work [25] 
may or may not have been from a passing test order, and the reverse order is 
run only once and not for every passing test order. 

Each project in the iDFlakies dataset is a Maven-based, Java project orga- 
nized into one or more modules, which are (sub)directories that organize code 
under test and test code. Each module contains its own test suite. For the re- 
mainder of the paper, we use the 121 modules in which our recent work [25] 
found at least one flaky test (but not necessarily OD test). To illustrate diver- 
sity among these 121 modules, the number of classes ranges from 1 to 2215, with 
an average of 61, and the total number of tests ranges from 1 to 4781, with an 
average of 287. The number of tests per class ranges from 1 to 200, with an 
average of 4.8. 

When we run some of the test orders generated by our systematic test-pair 
exploration as described in Section 5.2, we detect a total of 249 OD tests in 44 
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of the 121 modules. Of the 249 OD tests, 57 are brittles and 192 are victims. 
Compared to the OD tests detected in our prior work [22,24,25] that used the 
iDFlakies dataset, we find 44 new OD tests that have not been detected before. 
Of the 44 OD tests, 1 is brittle and 43 are victims. One of the newly detected 
victim tests (testMRAppMasterSuccessLock) is shown in Section 2. 


4 Analysis of Flake Rate and Simple Algorithm Change 


We next discuss how to compute the flake rate for each OD test. Let T be a test 
suite with an OD test. Prior work [22,24,25,47] would run many test orders of T 
and compute the flake rate for each test as a ratio of the number of test failures 
and the number of test runs. However, failures of flaky tests are probabilistic, 
and running even many test orders may not suffice to obtain the true flake rate 
for each test. Running more test orders is rather costly in machine time; in the 
limit, we may need to run all |T|! permutations to obtain the true flake rate for 
OD tests. To reduce machine time needed for computing the flake rate for OD 
tests, we first propose a new procedure, and then derive formulas based on this 
procedure. We finally show a simple change for sampling random test orders to 
increase the probability of detecting OD tests. 


4.1 Determining Test Outcome without Running a Test Order 


We use a two-step procedure to determine the test outcome for a given OD test. 
We assume that some prior runs already detected the OD test, and the goal is 
to determine the test outcome for some new test orders that were not run. 

In Step 1, we classify how each test from T relates to each OD test in a simple 
setting that runs only up to three tests. Specifically, we first determine whether 
an OD test ¢ is a victim or a brittle by running the test in isolation, i.e., just 
(t), by itself 10 times: if t always passes, it is considered a victim (although it 
may be an ND test); if ¢ always fails, it is considered a brittle (although it may 
be an ND test); and if t sometimes passes and sometimes fails, it is definitely an 
ND, not OD, test. This approach was proposed for iFixFlakies [37], and using 
10 runs is a common industry practice to check whether a test is flaky [31,40]. 

We then find (1) for each victim, all its single polluters in T and also all 
single cleaners for each polluter, and (2) for each brittle, all its single state- 
setters in T. To find polluters (resp. state-setters) of a victim (resp. brittle) test, 
iFixFlakies [37] takes as input a test order (of entire T) where the test failed 
(resp. passed) and then searches the prefix of the test in that test order using 
delta debugging [46] (an extended version of binary search). While iFixFlakies 
can find all polluters (resp. state-setters) in the prefix, it does not necessarily 
find all polluters in T, and it takes substantial time to find these polluters using 
delta debugging. The experiments show that in 98% of cases, binary search finds 
one test to be a polluter, although some rare cases need a polluter group that 
consists of two tests. 
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We propose a simpler and faster approach to find polluters (resp. state- 
setters) for the most common case: for each victim v (resp. brittle b) and each 
test t € T\ {v} (resp. t € T\ {b}), we run a pair of the test and the victim (resp. 
brittle), i.e., (t, v} (resp. (t,b)). If the victim fails (resp. brittle passes), then the 
test ¢ is a polluter (resp. state-setter). Further, for each victim v, its polluter 
p, and a test t € T \ {v,p}, we run a triple of (p,t,v), and if v passes, then t 
is a cleaner for the pair of v and p. Note that for the same victim v, different 
polluters may have different cleaners such as the example presented in Section 2. 

In Step 2, we determine whether each OD test passes or fails in a given test 
order using only the abstraction from Step 1, without actually running the test 
order. We focus on victims because they are more complex than brittles; brittles 
can be viewed as special cases with slight changes (requiring a state-setter to 
run before a brittle to pass, rather than requiring a polluter not to run before 
a victim to pass). Without loss of generality, we consider one victim at a time. 
Intuitively, the victim fails in a test order if a polluter is run before the victim 
without a cleaner between the polluter and the victim. Formally, we define the 
test outcome as follows. 


Definition 1 (Test Outcome from Abstraction). Let T be a test suite with 
one victim v E€ T, polluters P CT, and a family of cleaners Cp C T indexed by 
each polluter p € P. The outcome of v in a test order w is defined as follows: 


fail(w) = 3p € P. p ku vA Ac € Cy.p u cAc%, v; pass(w) = afail(w). 


This definition is an estimate of what one would obtain for all (repeated) runs 
of |T|! permutations, for three main reasons: (1) tests may behave differently in 
test orders than in isolation [24] (and an OD test may even be an ND test in 
some orders [24]); (2) polluters, cleaners, and state-setters may not be single 
tests but groups (iFixFlakies [37] reports that groups are rather rare); and (3) a 
test that fails in some prefix may behave differently for the tests that come 
after it in a test order than when the test passes (again, iFixFlakies [37] reports 
this issue to be rare, finding just one such case). Despite these potential sources 
of error, our evaluation shows that our use of abstraction obtains flake rates 
similar to iDFlakies for orders that iDFlakies ran. Most importantly, our use of 
abstraction allows us to evaluate many more orders without actually running 
them, thus taking much less machine time. 


4.2 Computing Flake Rate 


We next define flake rate, derive formulas for computing flake rate for two cases, 
and show why we need to sample test orders for other cases. 


Definition 2 (Flake Rate). For a test suite T with exactly one victim, given 
a set of test orders Q(T), the flake rate is defined as the ratio: 


F(T) = tw € R(T) | fail) }] / |2); 


we use the subscript fa and fo when we need to refer specifically to the flake 
rate for Qa(T) and Qc(T) (defined in Section 3), respectively. 
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We derive the formula for flake rate based on the number of polluters P and 
cleaners C for two special cases. In general, computing the flake rate can ignore 
tests that are not relevant, i.e., not in {v}UPUU, <p Cp. It is easy to prove that 
f(T) = f(T’) if T and T’ have the same victim, polluters, and cleaners—the 
reason is that the tests from T \ T’ are irrelevant in any order and do not affect 
the outcome of v; we omit the proof due to space limit. The further analysis 
thus focuses only on the relevant tests. 

Special Case 1: Assume that (A1) all polluters have the same set C of cleaners: 
C = Cp,Yp € P; and (A2) all of the victim, polluters, and cleaners are in the 
same class: Vt, t € {v}UPUC.class(t) = class(t’); it means that Q4(T) = Qc(T) 
and fa = fc. Let 7 = |P| and y = |C|. The total number of permutations of 
the relevant tests is (m + y + 1)!. While we can obtain |{w € Q(T) | fail(w)}| 
purely by definition, counting test orders where the victim fails, we prefer to 
take a probabilistic approach that will simplify further proofs. A victim fails 
if (1) it is not in the first position, with probability (m + y)/(a + y + 1), and 
(2) its immediate predecessor is a polluter, with probability a/(a + y), giving 
the overall flake rate f(T) = m/(m + y+ 1). This formula is simple, but real 
test suites often violate Al or A2. Of the 249 tests used in our experiments, 13 
violate both Al and A2, 207 violate only A2, and only 29 do not violate either. 
Special Case 2: Keeping A1 but relaxing A2, assume that the victim is in class 
Cı with 7, polluters and yı cleaners, and the other k—1 classes have 7; polluters 
and y; cleaners, 2 < i < k, where in general, either 7; or yi, but not both, can be 
zero for any class except for the victim’s own class where both 7, and J1 can be 
zero. Per Special Case 1, we have fa(T) = an mi) /(ok Ti + Sa qi +1). 
Next, consider class-compatible test orders, which do not interleave tests from 
different classes. The victim fails if (1) it fails in its own class, with probability 
T1ı/(T1 +71 + 1), or (2) the following three conditions hold: (2.1) the victim is 
the first in its own class, with probability 1/(7, + %1 + 1), (2.2) the class is not 
the first among classes, with probability (k — 1)/k, and (2.3) the immediately 
preceding class ends with a polluter, with probability 7;/(a; +) for each class 
i and thus the probability y (mi /(™: +7i))/(k— 1) across all classes. Overall, 


1 k Ti 
Tı + k Ži mit yi 


T) = 
fo(T) ee 


The formula is already more complex. It is important to note that we can have 
either fa(T) > fo(T) or fe(T) > fa(T), based on the ratio of polluters and 
cleaners in the victim’s own class vs. the ratio of polluters and victims in other 
classes, i.e., neither set of test orders ensures a higher flake rate. We show in 
Section 4.3 that both cases arise in practice. 

General Case: In the most general case, relaxing A1 to allow different polluters 
to have a different set of cleaners, while also having all these relevant tests in 
different classes, it appears challenging to derive a closed-form expression for 
fa(T), let alone for fo(T). We thus resort to estimating flake rates by sampling 
orders from 24(T) or Qc(T), and counting what ratio of them fail based on 
Definition 1 in Section 4.3. 
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Fig. 3. Distribution of flake rate for two sets of test orders. 


4.3 Comparing Flake Rate for Different Sets of Test Orders 


While tools such as iDFlakies [22] incorporate the requirement of not interleaving 
tests from different classes in a test order, some other tools [47] do not incorporate 
this requirement, so they allow all test orders. Recall that 94 (T) denotes the 
set of all test orders and Qc¢(T) denotes the set of test orders that satisfy the 
requirement. The reason to run 24(T) is to try to maximize the detection of all 
potential OD tests at the risk that some detected failures would be false positives. 
In particular, a test failure observed in some non-class-compatible order may 
not be reproducible in any class-compatible prefix of that order, e.g., due to 
the various ways to customize JUnit [17] (with annotations such as @Before, 
@BeforeClass, @Rule) or similar testing frameworks. The reason to run only 
Qc(T) is to detect OD-test failures that developers can observe from running 
the tests and are therefore motivated to fix. 

While both sets of test orders can detect all true positive OD tests, it is not 
clear which set of test orders are more likely to detect true positive OD tests. 
Intuitively, running 24(T) test orders can more likely detect failures if cleaners 
and victims are in the same class, while polluters are in different classes; in such 
cases, polluters are less likely to come in between cleaners and the victim. For 
example, for the victim presented in Section 2, the Q,4(T) flake rate is 10.5%, 
while the Qc¢(T) flake rate is 4.5%. On the other hand, running Qc¢(T) test 
orders can more likely detect failures if polluters and victims are in the same 
class, while cleaners are in different classes. Similar reasoning applies to brittles: 
if state-setters are more often in the same test class as the brittle, then the brittle 
is less likely to fail than if state-setters are more often in other classes. 

To compare these sets of test orders on real OD tests, we use the dataset of 
192 victim and 57 brittle tests described in Section 3.1. We collect all single test 
polluters for each victim and all single test cleaners for each polluter-victim pair. 
We also collect all single test state-setters for the brittles. We then use either the 
formulas presented in Section 4.2 or a large number of uniformly sampled test 
orders to obtain the flake rates, f4(T) and fc(T), for each test. Specifically, our 
formulas apply for 236 of the 249 tests. For the remaining 13 tests (all victims), 
we sample 100,000 test orders from each of 24(T) and Q¢(T) to estimate their 
flake rates. 

Figure 3 summarizes the results. For each set of test orders, the figure shows 
a boxplot that visualizes the distribution of flake rates for 249 OD tests. The 
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fa(T) flake rates have a slightly higher mean (38.4%) than the fco(T) flake 
rates (38.0%). Statistical tests for paired samples of the flake rates—specifically, 
dependent Student’s t-test obtains a p-value of 0.47 and Wilcoxon signed-rank 
test obtains a p-value of 0.01—show that the differences could be statistically 
significant (at a = 0.05 level). However, if we omit the 13 tests that required 
samplings, the means are 38.3% for f4(T) and 38.6% for fo(T), and the differ- 
ence is not statistically significant (dependent Student’s t-test obtains a p-value 
of 0.55, and Wilcoxon signed-rank test obtains a p-value of 0.19). 

Prior work [6,22,24,47] has not performed any explicit comparison between 
the two sets of test orders. Our results demonstrate that running 24(T) might 
be more likely to detect true positive OD tests. However, using such test orders 
may contain false positives. Future work on detecting OD tests should explore 
how to address false positives if 24(T) test orders are run. 


4.4 Simple Change to Increase Probability of Detecting OD Tests 


Inspired by our probability analysis, we propose a simple change to increase 
the probability of detecting OD tests. The standard algorithm for sampling S 
random test orders simply repeats S times the following steps: (1) w + sample a 
random test order from possible test orders (24(T') or Qo(T)); (2) obtain result 
r <run(w); (3) if r is FAIL, then print w. (A variant [22] may store previously 
sampled test orders to avoid repetition, but the number of possible test orders 
is usually so large that sampling the same one is highly unlikely, so one can save 
space and time by not tracking previously sampled test orders.) 

Our key change is to select the next test order as a reverse of the prior test 
order that passed: (4) if r is PASS, then wr + reverse(w). The intuition for this 
change is that a passing order may have the polluter after the victim. Therefore, 
reversing the passing order would have the polluter before the victim, and thus 
the reverse of the passing order should have a higher probability to fail than a 
random order that may have the polluter before or after the victim. Note that 
the reverse of a class-compatible test order is also a class-compatible test order, 
so this change applies to Q¢(T). The other changes are to run wp, print if it fails, 
and properly count the test orders to select exactly S samples of test orders. 

We next compute the probability that the reverse of a passing order fails. 
Special Case 1: Consider the Special Case 1 scenario from Section 4.2 with a 
polluters and y cleaners. For the standard algorithm, f(T) = fa(T) = fc(T) = 
w/(m +y +1). For our change, the conditional probability that the second 
test order fails given that the first test order passes is P(fail(wr)|pass(w)) = 
P(fail(wr) A pass(w))/P(pass(w)). We already have P(pass(w)) = 1 — f(T) = 
(y+1)/(7+74+1). 

To compute P(fail(wp)Apass(w)), we consider two cases based on the position 
of the victim in the passing test order w. (1) If the victim is first, with the 
probability of 1/(m + y+ 1), then the second test should be a polluter, with the 
probability of m/(m +y), so we get 7/((m7+7)(7+7+1)) for this case. (2) If the 
victim is not first, it cannot be the last in w because otherwise, wR would not 
fail, so the victim is in the middle, with the probability of (t+ y—1)/(m+y+1). 
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We also need a cleaner right before the victim, with probability y/(a + y), and 
a polluter right after the victim, with probability 7/(a + 7-1). Overall, we get 
the probability my/((a + y)(# + y + 1)) for this case. We can sum up the two 
cases to get P(fail(wr) A pass(w)) = m(y+1)/((a9 +7)(7 +7 + 1)). 

Finally, the conditional probability that the reverse test order fails given 
the first test order passes is P(fail(wp)|pass(w)) = (ee a EH) = 
m/(n +y). This probability is strictly larger than f(T) = m/(m +y+ 1), because 
am > 0 must be true for the victim to be a victim. 

Special Case 2: For the Special Case 2 scenario from Section 4.2, the common 
case is 71 +71 > 0 (ie., the victim’s class Cı has at least one other relevant 
test). Based on the relative position of the victim in class C4, we consider three 
cases: the victim runs first, in the middle, or last in class C,. After calculating 
the probability for the three cases separately and summing them up, we get the 
probability that the reverse test order fails and the first test order passes as 


: — mitkriyiıtnı Syty (mity +1) Sa — k Ti 
P(fail(wr) A pass(w)) = Cea CTE EEN where Sy = Di2 wa 


and Sy = Eo E . In Section 4.2, we have computed P(pass(w)), so dividing 
P(fail(wr) A pass(w)) by P(pass(w)) gives the conditional probability that the 
reverse test order fails given the first test order passes. Due to the complexity of 
the formulas, it is difficult to show a detailed proof that P(fail(wr)|pass(w)) > 
f(T), so we sample test orders instead. 

When we sample both N4(T) and Qc¢(T) for 100,000 random test orders 
on all 249 OD tests without reverse (i.e., the standard algorithm) and with 
reverse when a test order passes (i.e., our change), we find that our change does 
statistically significantly increase the chance to detect OD tests. Specifically, for 
Qa(T), test orders without reverse obtain a mean of 38.6%, while test orders 
with reverse of passing test orders obtain a mean of 45.3%. Statistical tests for 
paired samples on the flake rates without and with reverse for 2,4(T) show a 
p-value of ~ 10788 for dependent Student’s t-test and a p-value of ~ 10743 
for Wilcoxon signed-rank test. Similarly, for Qc¢(T), test orders without reverse 
obtain a mean of 38.0%, while test orders with reverse of passing test orders 
obtain a mean of 45.3%. Statistical tests for paired samples on the flake rates 
without and with reverse for Qo(T) show a p-value of ~ 1074? for dependent 
Student’s t-test and a p-value of ~ 1074? for Wilcoxon signed-rank test. 

Based on these positive results, we have changed the iDFlakies tool [22] so 
that, by default, it runs the reverse of the previous order, instead of running a 
random order, if the previous order found no new flaky test. 


5 Generating Test Orders to Cover Test Pairs 


We next discuss our algorithm to generate test orders that systematically cover 
all test pairs for a given set T with n tests. The motivation is that even with our 
change to increase the probability to detect OD tests, the randomization-based 
sampling remains inherently probabilistic and can fail to detect an OD test. 
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5.1 Special Case: All Orders are Class-Compatible 


We first focus on the special case where we have only one class, or many classes 
that each have only one test, so all n! permutations are class-compatible. For ex- 
ample, for n = 2 we can cover both pairs with Ro = { (t1, t2), (t2, t1) }, and for n = 
4 we can cover all 12 pairs with 4 test orders Q4 = {(t1, ta, t2, t3), (t2, t1, ts, ta), 
(t3, t2, ta, t1), (ta, t3, t1, t2) }. Recall that n is the minimum number of test orders 
needed to cover all test pairs, so the cases for n = 2 and n = 4 are optimal. The 
reader is invited to consider for n = 3 whether we can cover all 6 test pairs with 
just 3 test orders. The answer is upcoming in this section. 

To address this problem, we consider Tuscan squares [7], objects studied in 
the field of combinatorics. Given a natural number n, a Tuscan square consists 
of n rows, each of which is a permutation of the numbers {1,2,...,n}, and every 
pair (i,j) of distinct numbers occurs consecutively in some row. Tuscan squares 
are sometimes called “row-complete Latin squares” [34], but note that Tuscan 
squares need not have each column be a permutation of all numbers. 

A Tuscan square of size n is equivalent to a decomposition of the complete 
graph on n vertices, Kp, into n Hamiltonian paths [42]. The decomposition 
for even n has been known since the 19t} century and is often attributed to 
Walecki [26]. The decomposition for odd n > 7 was published in 1980 by Till- 
son [42]. Tillson presented a beautiful construction for n = 4m + 3 and a rather 
involved construction for n = 4m + 1 with a recursive step and manually con- 
structed base case for n = 9. In brief, Tuscan squares can be constructed for all 
values of n except n = 3 or n = 5. We did not find a public implementation for 
generating Tuscan squares, and considering the complexity of the case n = 4m+1 
in Tillson’s construction, we have made our implementation public [44]. 

We can directly translate permutations from Tuscan squares into n test orders 
that cover all test pairs in this special case (where all test pairs are either only 
intra-class test pairs of one class or only inter-class test pairs of n classes). 
These sets of test orders have the minimal possible cost: Cost(2,) = n(Costo + 
Cost(T)), substantially lower than Cost( 2p) for running all test pairs in isolation. 
For n = 3 and n = 5, we have to use 4 and 6 test orders, respectively, to cover 
all test pairs. For example, for n = 3 we can cover all 6 pairs with 4 orders 
{(t1, te, ts), (t2, t1, t3), (t3, t1), (ts, te) }. 


5.2 General Case 


Algorithm 1 shows the pseudo-code algorithm to generate test orders that cover 
all test pairs in the general case where we have more than one class and at 
least one class has more than one test. The main function calls two functions to 
generate test orders that cover intra-class and inter-class test pairs. 

The function cover_intra_class_pairs generates test orders that cover all 
intra-class test pairs. For each class, the function compute_tuscan_square is 
used to generate test orders of tests within the class to cover all intra-class 
test pairs. These test orders for each class are then appended to form a test 
order for the entire test suite T. The function pick, invoked on multiple lines, 
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Algorithm 1: Generate test orders that cover all intra-test-class and 
inter-test-class test-method pairs 


1 Input: T # test suite, a set of test methods partitioned into test classes 
2 Output: 2 # output is a set of test orders 
3 Function cover_all_pairs(): 
4 2 = {} # empty set 
5 cover_intra_class_pairs() 
6 cover_inter_class_pairs() 
7 Function cover_intra_class_pairs(): 
8 map = {} # map each class to all its intra-class orders 
9 for C € classes(T) do 
10 map = map U {(C, wc) | wo E€ compute_tuscan_square(C) } 
11 while map 4 {} do 
12 w = () # empty order 
13 Cs = {C|dwo. (C,wc) € map} 
14 for C € Cs do 
15 wo = pick({wc | (C,wc) € map}) 
16 map = map \ {(C,wc)} 
17 w =w wc # append order 
18 Q = 2U {w} 


19 Function cover inter class pairs (): 
20 pairs = {(t,t') |t, t € T A class(t) # class(t’)}\ # from all inter-class pairs.. 


21 {(t,t’) | dw € R. cover(w, (t, t')})} # ..remove covered by intra-class orders 
22 while pairs 4 {} do 

23 w = pick(pairs) # start with a randomly chosen not-covered pair 
24 pairs = pairs \ {w} 

25 while true do 

26 tp = Wiw|—1 # previously last test 
37 ts = {t | (tp, t) € pairs A class(t) ¢ classes(w)} 

28 if ts = {} then 

29 | break 

30 tn = pick(ts) # next test to extend order 
31 pairs = pairs\{(tp, tn) } 

32 w=wOtn 

33 R = NU {w} 


chooses a random element from a set. The outer loop iterates as many times 
as the maximum number of intra-class test orders for any class. When the loop 
finishes, 22 contains a set of test orders that cover all intra-class and some inter- 
class test pairs. Each test order that concatenates tests from | classes covers 
l — 1 inter-class test pairs. (Using just these test orders, we already detected 44 
new OD tests in the test suites from the iDFlakies dataset.) Each intra-class 
test pair is covered by exactly one test order. Modulo the special cases for n = 3 
and n = 5, each covered inter-class pair appears in exactly one test order in 
92, because Tuscan squares satisfy the invariant that each element appears only 
once as the first and once as the last in the permutations in a Tuscan square. 
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The function cover_inter_class_pairs generates more test orders to cover 
the remaining inter-class test pairs. It uses a greedy algorithm to first initialize a 
test order with a randomly selected not-covered test pair and then extend the test 
order with a randomly selected not-covered test pair as long as an appropriate 
test pair exists. Extending the test order as long as possible reduces both the 
number of test orders and the number of times each test needs to be run. 

We evaluate our randomized algorithm on 121 modules from the iDFlakies 
dataset as described in Section 3.1. We use the total cost, which considers the 
number of test orders and the number of tests in all of those test orders. The 
number of test orders is related to Costo, while the number of tests is related to 
Cost, as defined in Section 3. We run our algorithm 10 times for various random 
seeds. The coefficient of variation [3] for each module shows that the algorithm 
is fairly stable, with the average for all modules being only 1.1% and 0.25% for 
the number of test orders and the number of tests, respectively. 

Compared with 92, that has all test orders of just test pairs, our randomized 
algorithm’s average number of test orders and the average number of tests are 
only 3.68% and 51.8%, respectively, that of all the 2, test orders. The overall 
cost of the test orders generated by our randomized algorithm is close to the 
optimal, because the number of test orders is reduced by almost two orders of 
magnitude, and 51.8% of the number of tests is close to the theoretical minimum 
of 50% that of 92, test orders for Cost. 


6 Conclusion 


Order-dependent (OD) tests are one prominent category of flaky tests. Prior 
work [22,24,47] has used randomized test orders to detect OD tests. In this 
paper, we have presented the first analysis of the probability that randomized 
test orders detect OD tests. We have also proposed a simple change for sampling 
random test orders to increase the probability of detecting OD tests. We have 
finally proposed a novel algorithm that systematically explores all consecutive 
pairs of tests, guaranteeing to find all OD tests that depend on one other test. 
Our experimental results show that our algorithm runs substantially fewer tests 
than a naive exploration that runs all pairs of tests. Our runs of some test 
orders generated by the algorithm detect 44 new OD tests, not detected in prior 
work [22,24,25] on the same evaluation dataset. 
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Abstract. Timed automata (TA) have shown to be a suitable formal- 
ism for modeling real-time systems. Moreover, modern model-checking 
tools allow a designer to check whether a TA complies with the system 
specification. However, the exact timing constraints of the system are of- 
ten uncertain during the design phase. Consequently, the designer is able 
to build a TA with a correct structure, however, the timing constraints 
need to be tuned to make the TA comply with the specification. 

In this work, we assume that we are given a TA together with an exis- 
tential property, such as reachability, that is not satisfied by the TA. We 
propose a novel concept of a minimal sufficient reduction (MSR) that 
allows us to identify the minimal set S of timing constraints of the TA 
that needs to be tuned to meet the specification. Moreover, we employ 
mixed-integer linear programming to actually find a tuning of S that 
leads to meeting the specification. 


Keywords: Timed Automata - Relaxation - Design - Reachability. 


1 Introduction 


A timed automaton (TA) [4] is a finite automaton extended with a set of real-time 
variables, called clocks, which capture the time. The clocks enrich the semantics 
and the constraints on the clocks restrict the behavior of the automaton, which 
are particularly important in modeling time-critical systems. The examples of 
TA models of critical systems include scheduling of real-time systems [30,29,33], 
medical devices [43,38], and rail-road crossing systems [52]. 

Model-checking methods allow for verifying whether a given TA meets a given 
system specification. Contemporary model-checking tools, such as UPPAAL [17] 
or Imitator [9], have proved to be practically applicable on various industrial case 
studies [17,10,34]. Unfortunately, during the system design phase, the system in- 
formation is often incomplete. A designer is often able to build a TA with correct 
structure, i.e., exactly capturing locations and transitions of the modeled system, 
however the exact clock (timing) constraints that enable/trigger the transitions 
are uncertain. Thus, the produced TA often does not meet the specification (i.e., 
it does not pass the model-checking) and it needs to be fixed. If the specification 
declares universal properties, e.g., safety or unavoidability, that need to hold on 
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each trace of the TA, a model-checker either returns “yes”, or it returns “no 
and generates a trace along which the property is violated. This trace can be 
used to repair the model in an automated way [42]. However, in the case of ex- 
istential properties, such as reachability, the property has to hold on a trace of 
the TA. The model-checker either returns “yes” and generates a witness trace 
satisfying the property, or returns just “no” and does not provide any additional 
information that would help the designer to correct the TA. 


Contribution. In this paper, we study the following problem: given a timed 
automaton A and a reachability property that is not satisfied by A, relax clock 
constraints of A such that the resultant automaton A’ satisfies the reachability 
property. Moreover, the goal is to minimize the number of the relaxed clock 
constraints and, secondary, also to minimize the overall change of the timing 
constants used in the clock constraints. We propose a two step solution for this 
problem. In the first step, we identify a minimal sufficient reduction (MSR) of A, 
i.e., an automaton A” that satisfies the reachability property and originates from 
A by removing only a minimal necessary set of clock constraints. In the second 
step, instead of completely removing the clock constraints, we employ mixed in- 
teger linear programming (MILP) to find a minimal relaxation of the constraints 
that leads to a satisfaction of the reachability property along a witness path. 


The underlying assumption is that during the design the most suitable timing 
constants reflecting the system properties are defined. Thus, our goal is to gen- 
erate a TA satisfying the reachability property by changing a minimum number 
of timing constants. Some of the constraints of the initial TA can be strict (no 
relaxation is possible), which can easily be integrated to the proposed solution. 
Thus, the proposed method can be viewed as a way to handle design uncertain- 
ties: develop a TA A in a best-effort basis and apply our algorithm to find a A’ 
that is as close as possible to A and satisfies the given reachability property. 


Related Work. Another way to handle uncertainties about timing constants 
is to build a parametric timed automaton (PTA), i.e., a TA where clock con- 
stants can be represented with parameters. Subsequently, a parameter synthesis 
tool, such as [46,9,26], can be used to find suitable values of the parameters for 
which the resultant TA satisfies the specification. However, most of the param- 
eter synthesis problems are undecidable [6]. While symbolic algorithms without 
termination guarantees exist for some subclasses [25,39,12], these algorithms are 
computationally very expensive compared to model checking (see [5]). Moreover, 
minimizing the number of modified clock constraints is not straightforward. 


A related TA repair problem has been studied in a recent work [7], where the 
authors also assumed that some of the constraints are incorrect. To repair the 
TA, they parametrized the initial TA and generated parameters by analyzing 
traces of the TA. However, the authors [7] do not focus on repairing the TA 
w.r.t. reachability properties as we do. Instead, their goal is to make the TA 
compliant with an oracle that decides if a trace of the TA belongs to a system 
or not. Thus, their approach cannot handle reachability properties. Furthermore 
in [7], the total change of the timing constraints is minimized, while we primarily 
minimize the number of changed constraints, then the total change. 
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Fig. 1. An example of a timed automaton. 


2 Preliminaries and Problem Formulation 


2.1 Timed Automata 


A timed automaton (TA) [3,4,44] is a finite-state machine extended with a finite 
set C of real-valued clocks. A clock x € C measures the time spent after its last 
reset. In a TA, clock constraints are defined for locations (states) and transitions. 
A simple clock constraint is defined as x—y ~ c where x,y € Cu {0}, ~ € {<, <} 
and c € ZU{o}.° Simple clock constraints and constraints obtained by combining 
these with conjunction operator (A) are called clock constraints. The sets of 
simple and all clock constraints are denoted by s(C) and &(C), respectively. 
For a clock constraint ¢ € P(C), S(¢) denotes the simple constraints from ¢, e.g., 
S(x— y < 10 ^ y < 20) = {x — y < 10,y < 20}. A clock valuation v : C > Ry 
assigns non-negative real values to each clock. The notation v = ġ denotes that 
the clock constraint ¢ evaluates to true when each clock x is replaced with v(x). 
For a clock valuation v and d € R+, v +d is the clock valuation obtained by 
delaying each clock by d, i.e., (v + d)(a) = v(x) + d for each æ e C. For ACC, 
v[A := 0] is the clock valuation obtained after resetting each clock from A, i.e., 
v[A := 0](a) = 0 for each z € à and v[A := 0] (x) = v(x) for each x € C\X. 


Definition 1 (Timed Automata). A timed automaton A = (L, lo, C, A, Inv) 
is a tuple, where L is a finite set of locations, lo € L is the initial location, C is 
a finite set of clocks, AC L x 2° x &(C) x L is a finite transition relation, and 
Inv: L > D(C) is an invariant function. 


For a transition e = (l,,A,¢,l:) € A, ls is the source location, l+ is the 
target location, A is the set of clocks reset on e and ¢ is the guard (i.e., a clock 
constraint) tested for enabling e. The semantics of a TA is given by a labelled 
transition system (LTS). An LTS is a tuple 7T = (S, so, X, —>), where S is a set 
of states, so € S is an initial state, X is a set of symbols, and > S S x X x S'is 
a transition relation. A transition (s,a,s’) € > is also shown as s 5 8’. 


Definition 2 (LTS semantics for TA). Given a TA A = (L,lo,C,A, Inv), 
the labelled transition system T(.A) = (S, so, X, —) is defined as follows: 


3 Simple constraints are only defined as upper bounds to ease the presentation. This 
definition is not restrictive since x — y > c and x > c are equivalent to y — x < —c 
and 0 — x < —c, respectively. A similar argument holds for strict inequality (>). 
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- § = {(l,v) | le L,v € RO vk mo()}, 
— so = (lo, O), where O(x) = 0 for each x € C, 
— X = {act} U R4, and 
— the transition relation — is defined by the following rules: 
e delay transition: (l, v) 4 (l,v + d) ifv +d |H Invi(l) 
o discrete transition: (l, v) * (U, v’) if there exists (l, A, $, U) € A such that 
v = @¢, v' = v[à := 0], and v' = Inv(l’). 


The notation s—>q45' is used to denote a delay transition of duration d followed 


by a discrete transition from s to s’, i.e., s 45% s, A run p of A is either a 
finite or an infinite alternating sequence of delay and discrete transitions, i.e., 
P = 80> dy$1—-d, $2—d, `°- The set of all runs of A is denoted by [[A]]. 

A path z of A is an interleaving sequence of locations and transitions, 7 = 
lo, €1,11, EQ, s 045 where Cipi = (hi, Ài+1, Pi41; li+1) e A for each i > 0. A path T= 
lo, €1, 11, €2,... is said to be realizable if there exists a delay sequence do, di,... 
such that (lo, 0)—a, (li, v1) a, (lu, v2) a, +++ is a run of A and for every i > 1, 
the ith discrete transition is taken according to e;, i.e., ei = (li-1, Ai Qi li) 
vi—1 + di—1 Æ| Qi, vi = (vi—1 + di-1)[Ai = 0] and v;i = Inv' (li). 

Given a TA A, a subset Lr c L ofits locations is reachable on A if there exists 
p = (lo, 0)>ao (l1, V1) ay ©» dy. (ln, Un) € [[A]] such that ln € Lr; otherwise, 
Lr is unreachable. The reachability problem is decidable and implemented in 
various verification tools, e.g., [17,9]. The verifier either returns “No” when the 
location is unreachable, or it generates a run (witness) reaching the set Lr. 


Example 1. Figure 1 illustrates a TA with 8 locations: {lo,...,/7}, 9 transitions: 
{e1,..-, eg}, an initial location lo, and an unreachable set of locations Lr = {l4}. 


2.2 Timed Automata Relaxations and Reductions 


For a timed automaton A = (L, lo, C, A, Inv), the set of pairs of transition and 
associated simple constraints is defined in (1) and the set of pairs of location 
and associated simple constraints is defined in (2). 


P(A) = {(e, 9) | e = (Is, A, $, lt) € A, E S(G)} (1) 
W(Inv) = {(l,y) | le L, ge S(Inv(l))} (2) 


Definition 3 (constraint-relaxation). Let ¢€ P(C) be a constraint over C, 
O S S(¢) be a subset of its simple constraints andr : O > Nu {0} be a positive 
valued relaxation valuation. The relaxed constraint is defined as: 


R¢,0,r)=| A e)a ( /\ o-y~erro)) (3) 


peS($)\O p=r—y~ceO 


Intuitively, R(¢é,O,r) relaxes only the thresholds of simple constraints from © 
with respect to r, e.g., R(@— y < 10 ^a y < 20, {y < 20},r) =x-—y< 10a 
y < 23, where r(y < 20) = 3. Setting a threshold to o% implies removing the 
corresponding simple constraint, e.g., R(x — y < 10a y < 20,{y < 20},r) = 
x — y < 10, where r(y < 20) = œ. Note that R(¢,O,r) = when O is empty. 
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Definition 4 ((D,J/,r)-relaxation). Let A = (L, lọ, C, A, Inv) be a TA, DE 
W(A) and I € W(Inv) be transition and location constraint sets, and r : DUI > 
Nv {0} be a positive valued relaxation valuation. The (D,I,r)-relaxation of A, 
denoted Acp rs, is a TA A = (L',15,C", A’, Inv’) such that: 


L= L, lo = l, C=C’, and 

— A’ originates from A by relaxing D viar. For e = (ls, à, Q, l+) € A, let D|e = 
{y | (e, p) € D}, and letr|e(p) = r(e, p), then A’ = { (ls, à, R(ġ, Die, rle), l) | 
e= (ls, A, Q, lt) E€ A} 

— Inv' originates from Inv by relaxing I via r. Forle L, let Ij = {py | (l, p) € 
I}, and r\i(p) = r(l, p), then Inv (l) = R(Inv(l), Ij, rh). 


Intuitively, the TA A<D,I,r> emerges from A by relaxing the guards of the 
transitions from the set D and relaxing invariants of the locations from J with 
respect to r. In the special case of setting the threshold of each constraint from 
D and I to œ, i.e., when r(a) = © for each a € DU I, the corresponding 
simple constraints are effectively removed, which is called a (D,J)-reduction and 
denoted by A<p,rs. Note that A= Acg gs. 


Proposition 1. Let A = (L,lo,C,A,Inv) be a timed automaton, D © W(A) 
and I< W(Inv) be sets of simple guard and invariant constraints, and r : DU 
I>Nv {0} be a relaxation valuation. Then [[A]] € [[A<p.1r>]]- 


Proof. Observe that for a clock constraint ¢ € (C), a subset of its simple 
constraints © S S(¢), a relaxation valuation r’ for O, and the relaxed constraint 
R(¢, O,r’) as in Definition 3, it holds that for any clock valuation v : v = ọ ==> 
v =| R(¢,0,r’). Now, consider a run p = (lo, O)—>ao (li, U1) ay (l2, V2) ay + € 
HAJ]. Let 7 = lo, €1, l1, €2,... with €; = (li-1, Ais Qis li) e A for each i > 1 be 
the path realized as p via delay sequence do, d,,.... By Definition 4 for each 
(L, A, p, l) € A, there is (1, A, R(¢, D|e, rle), l) € A’. We define a path induced by 
mon Å<D,I,r> as: 


M(r) = lo, (lo, Ai, R(¢1, Die, tle), 4), 4, (h, à2, R(b2, Dies, rle), l2), PRE (4) 


For each i = 0,...,n — 1 it holds that v; = R(Inv(l), Dl, rlu), vi + di = 
R(Inv(li), D|, r|) and utd; H| R(Qi+1, Die, ,,,0le:4,). Thus M (7r) is realizable 
on Á<D,I r> via the same delay sequence and p € |[A<D,1r>]]. As p € [[A]] is 
arbitrary, we conclude that [[A]] © [[A<p.rrs]]- 


2.3 Problem Statement 


Problem 1. Given a TA A = (L, lo, C, A, Inv) and a set of target locations Lr c 
L that is unreachable on A, find a (D, I, r)-relaxation A< p,7,r> of A such that Lr 
is reachable on A<p,I, r>. Moreover, the goal is to identify a (D, I, r)-relaxation 
that minimizes the number |D u J| of relaxed constraints, and, secondary, we 
tend to minimize the overall change of the clock constraints )).¢p 7, ¥(0)- 
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We propose a two step solution to this problem. In the first step, we identify 
a subset D u I of the simple constraints ¥(A) U W(Inv) such that Lr is reach- 
able on the (D, )-reduction Acp,7s and |D ù I| is minimized. Consequently, we 
can obtain a witness path of the reachability on A<p,r> from the verifier. The 
path would be realizable on A if we remove the constraints D u I. In the second 
step, instead of completely removing the constraints D u I, we find a relaxation 
valuation r : DUI — Nu {0} such that the path found in the first step is realiz- 
able on Acp,rr>. To find r, we introduce relaxation parameters for constraints 
in D u TI. Subsequently, we solve an MILP problem to find a valuation of the 
parameters, i.e., r, that makes the path realizable on A<D,I,r> and minimizes 
Neepur r(e). Note that it might be the case that the reduction A<p,7> contains 
multiple realizable paths that lead to Lr, and another path might result in a 
smaller overall change. Also, there might exist another candidate subset D’ vu I’ 
with |D’ u I'| = |D u I| that would lead to a smaller overall change. While 
our approach can be applied to a number of paths and a number of candidate 
subsets D u I, processing all of them can be practically intractable. 


3 Minimal Sufficient (D,I)-Reductions 


Throughout this section, we simply write a reduction when talking about a (D,I)- 
reduction of A. To name a reduction, we either simply use capital letters, e.g., 
M,N, K, or we use the notation A<p,rs to also specify the sets D,I of simple 
clock constraints. Given a reduction N = Acp,rs, |N| denotes the cardinality 
|D u I|. Furthermore, R4 denotes the set of all reductions. We define a partial 
order relation E on R4 as Acp yrs © Acp ys iff DUI SC D' ul’. Similarly, we 
write Acp rs G Acp r> iff DUIS D'uT. We say that a reduction A<p ys is 
a sufficient reduction (w.r.t. A and Lr) iff Lr is reachable on A< p,r>; otherwise, 
Aep,.r> is an insufficient reduction. Crucial observation for our work is that the 
property of being a sufficient reduction is monotone w.r.t. the partial order: 


Proposition 2. Let Acp 1> and Acp ss be reductions such that A<D 1> E 
Aep rs. If Acp,rs is sufficient then Acp rs is also sufficient. 


Proof. Note that Acp ps is a (D’\D,I'\I)-reduction of Azp 75. By Proposi- 
tion 1, [[A<p.r>]] € [[Aep ws], i.e., the run of Acp rs that witnesses the 
reachability of Lr is also a run of Acp rs. 


Definition 5 (MSR). A sufficient reduction A<p,r> is a minimal sufficient 
reduction (MSR) iff there is no ce DUT such that the reduction A <p\‘c},1\{c}> 
is sufficient. Equivalently, due to Proposition 2, A<p,1> is an MSR iff there is 
no sufficient reduction Acp ps such that Acp rs G Aen. 


Recall that a reduction Azp,;s is determined by D C (A) and I S W(Inv). 
Consequently, |R4| = 2!¥()¥"2"), Moreover, there can be up to (xj) MSRs 


where k = |¥(A) u W(Inv)| (see Sperner’s theorem [51]). Also note, that the 
minimality of a reduction does not mean a minimum number of simple clock 
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Algorithm 1: Minimum MSR Extraction Scheme 
N — Acw(d),w(inv)>; M D; T D 
while N ¥ null do 
M,Z <— shrink(N,T) // Algorithm 2 
M<- Mu {M} 
N,T < findSeed(M, M, T) // Algorithm 3 


ak WN FH 


return M 


o>) 


constraints that are reduced by the reduction; there can exist two MSRs, M and 
N, such that |M| < |N]. Since our overall goal is to relax A as little as possible, 
we identify a minimum MSR, i.e., an MSR M such that there is no MSR M’ 
with |M’| < |M], and then use the minimum MSR for the MILP part (Section 4) 
of our overall approach. There can be also up to ( eia) minimum MSRs. 


Example 2. Assume the TA A and Lr = {l4} from Example 1 (Fig. 1). There 
are 24 MSRs and 4 of them are minimum. For example, A<p,1> with D = 
{(é5, > 25)} and I = {(l3,u < 26)} is a minimum MSR, and Acp ys with 
D' = {(e9,y < 15), (e7,z < 15)} and I’ = {(l6,x < 10)} is a non-minimum MSR. 


3.1 Base Scheme For Computing a Minimum MSR 


Algorithm 1 shows a high-level scheme of our approach for computing a minimum 
MSR. The algorithm iteratively identifies an ordered set of MSRs, |M1| > |M2| > 
++» > |M;|, such that the last MSR M; is a minimum MSR. Each of the MSRs, 
say M;, is identified in two steps. First, the algorithm finds a seed, i.e., a reduction 
N; such that N; is sufficient and |N;| < |M;-1|. Second, the algorithm shrinks N; 
into an MSR M; such that M; = N; (and thus |M;| < |N;|). The initial seed Ni 
is Acw(A),w(Inv)>; ie., the reduction that removes all simple clock constraints 
(which makes all locations of A trivially reachable). Once there is no sufficient 
reduction N; with |N;| < |Mj—1|, we know that Mi—ı = Mk is a minimum MSR. 

Note that the algorithm also maintains two auxiliary sets, M and Z, to store 
all identified MSRs and insufficient reductions, respectively. The two sets are 
used during the process of finding and shrinking a seed which we describe below. 


3.2 Shrinking a Seed 


Our approach for shrinking a seed N into an MSR M is based on two concepts: 
a critical simple clock constraint and a reduction core. 


Definition 6 (critical constraint). Given a sufficient reduction Acp >, a 
simple clock constraint c is critical for Acp,1> iff Acp\{c},1\{c}> is insufficient. 


Proposition 3. Ifce DUI is critical for a sufficient reduction A<p,r> then 
c is critical for every sufficient reduction Acp vs, Aco vs © Acp ss. More- 
over, by Definitions 5 and 6, Acp,rs is an MSR iff every ce DUT is critical 
for Aep,I> . 
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Algorithm 2: shrink(A<p,;.,Z) 


X- ZG 
while (D u I) 4 X do 
c — pick a simple clock constraint from (D u I)\X 
if Acp\jc} nfs E T and Acp\{c},1\{c}> is sufficient then 
| p < a witness run of the sufficiency of A<p\ {0}, N {e}> 
Az<p,1> < the reduction core of A<p\{c},1\{c}> W-I.t. p 
else 


peo 


ooN aana AUNE 


T—-TU{NeERA|NE Acpn\te3,n03>} 


return Acp,rs,T 


m 
° 


Proof. By contradiction, assume that c is critical for A< p,r> but not for Acp rs, 
i.e., Az pd\{c},1\{c}> is insufficient and A zp \{c},1\{c}> is sufficient. As A< p ps E 
Azp,1>, we have A< Dht, re> E Acp\{c},\{>- By Proposition 2, if the re- 
duction A< p\{c},1\{c}> is sufficient then A<p)\{c},1\{c}> is also sufficient. 


Definition 7 (reduction core). Let A<p,r> be a sufficient reduction, p a wit- 
ness run of the sufficiency (i.e., reachability of Lr on Azp,1>), and n the path 
corresponding to p. Futhermore, let M(m) = lo,€1,...,€n,ln be the path cor- 
responding to m on the original TA A defined as in (4). The reduction core of 
Acp > w.r.t. p is the reduction Acp r> where D' = {(e,y) |(e,y) € Dae =e; 
for some1<i<n} andl’ ={(l,p)| (l ,p)E I ^al= l; for some0<l <n}. 


Intuitively, the reduction core of Acp,7s w.r.t. p reduces from A only the 
simple clock constraints that appear on the witness path in A. 


Proposition 4. Let Acp 1s be a sufficient reduction, p the witness of reach- 
ability of Lr on A<p ss, and Acp ys the reduction core of Acp r> w.r.t. p. 
Then Acp > is a sufficient reduction and Acp vs © Å<D,I>- 


Proof. By Definition 7, D' € D and I’ CTI, thus Acp ys © Acp,rs. As for the 
sufficiency of Acp:7s, we only sketch the proof. Intuitively, both Acp,7s and 
Ap r> originate from A by only removing some simple clock constraints (DUI, 
and D’ u I’, respectively), i.e., the graph structure of Acp rs and Acp rs is 
the same, however, some corresponding paths of Acp js and Acp rs differ 
in the constraints that appear on the paths. By Definition 7, the path m that 
corresponds to the witness run p of Acp,rs is also a path of Acp ps. Since 
realizability of a path depends only on the constraints along the path, if m is 
realizable on A<p,r> then 7 is also realizable on Acp rs. 


Our approach for shrinking a sufficient reduction N is shown in Algorithm 2. 
The algorithm iteratively maintains a sufficient reduction Acp,js and a set 
X of known critical constraints for A<p,r>. Initially, A<D,1> = N and X = 
Ø. In each iteration, the algorithm picks a simple clock constraint c € (D u 
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I)\X and checks the reduction Ax D\{c},1\{c}> for sufficiency. If Ax D\{c},1\{c}> is 
insufficient, the algorithm adds c to X. Otherwise, if A< p\40},r\{c}> is sufficient, 
the algorithm obtains a witness run p of the sufficiency from the verifier and 
reduces A<p,;s to the corresponding reduction core. The algorithm terminates 
when (D u I) = X. An invariant of the algorithm is that every c €e X is critical 
for Acp,rs. Thus, when (D u I) = X, Acp,rs is an MSR (Proposition 3). 
Note that the algorithm also uses the set Z of known insufficient reduc- 
tions. In particular, before calling a verifier to check a reduction for sufficiency 
(line 4), the algorithm first checks (in a lazy manner) whether the reduction 
is already known to be insufficient. Also, whenever the algorithm determines a 
reduction A<p\{c},1\{¢}> to be insufficient, it adds A< D\{fe}, \{c}> and every N, 
NE A Dio}, Nge to TZ (by Proposition 2, every such N is also insufficient). 


3.3 Finding a Seed 


We now describe the procedure findSeed. The input is the latest identified MSR 
M, the set M of known MSRs, and the set Z of known insufficient reductions. 
The output is a seed, i.e., a sufficient reduction N such that |N| < |M], or 
null if there is no seed. Let us denote by CAND the set of all candidates on a 
seed, i.e., CAND = {N € Ry ||N| < |M|}. A brute-force approach would be to 
check individual reductions in CAND for sufficiency until a sufficient one is found, 
however, this can be practically intractable since |CAND| = Py ae (ee ee I). 
We provide three observations to prune the set CAND of candidates that need 
to be tested for being a seed. The first observation exploits the set Z of already 
known insufficient reductions: no N € Z can be a seed. The second observation 
exploits the set M of already known MSRs. By the definition of an MSR, for 
every M’ e M and every N such that N & M’, the reduction N is necessarily 
insufficient and hence cannot be a seed. The third observation is stated below: 


Observation 1. For every sufficient reduction N € CAND there exists a sufficient 
reduction N’ € CAND such that N © N’ and |N'| = |M]| — 1. 


Proof. If |N| = |M|—1, then N = N’. For the other case, when |N| < |M|-—1, 
let N = Aepn IN> and M = Å<DM js. We construct N’ = Å DN pn's by 
adding arbitrary (|/|—|N])—1 simple clock constraint from (DY UI™)\(DN u 
IY) to (DY u IY), ie, DY ol c DY u I c (D~ oI o DY uI) and 
|D UIN] = |M|—1. By definition of CAND, N’ € CAND. Moreover, since N & N’ 
and N is sufficient, then N’ is also sufficient (Proposition 2). 


Based on the above observations, we build a set C of indispensable candidates 
on seeds that need to be tested for sufficiency: 


C={NERA| NET AaYM' e M.N £M 7 J|N|=]|M|-1} (5) 


The procedure findSeed, shown in Algorithm 3, in each iteration picks a 
reduction N eC and checks it for sufficiency (via the verifier). If N is sufficient, 
findSeed returns N as the seed. Otherwise, when N is insufficient, the algorithm 
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first attempts to enlarge N into an insufficient reduction Æ such that N = E. By 
Proposition 2, every reduction N’ such that N’ © E is also insufficient, thus all 
these reductions are subsequently added to Z and hence removed from C (note 
that this includes also N). If C becomes empty, then there is no seed. 

The purpose of enlarging N into E is to quickly prune the candidate set C. We 
could just add all the insufficient reductions {N’| N” © N} to Z, but note that 
HN’ |N’ E E}| is exponentially larger than {N | N’ © N}| w.r.t. |E|— |N]. The 
enlargement, shown in Algorithm 4, works almost dually to shrinking. Let N = 
A<p,r>. The algorithm attempts to one by one add the constraints from ¥(A)\D 
and W(Inv)\I to D and I, respectively, checking each emerged reduction for 
sufficiency, and keeping only the changes that preserve A< p,r> to be insufficient. 


3.4 Representation of Z and C 


The final piece of the puzzle is how to efficiently manipulate with the sets Z and 
C. In particular, we are adding reductions to Z and C, removing reductions from 
C, checking if a reduction belongs to Z, checking if C is empty, and picking a 
reduction from C. The problem is that the size these sets can be expontential 
w.r.t. |Y(A) u W(Inv)| (there are exponentially many reductions), and thus, 
it is practically intractable to maintain the sets explicitly. Instead, we use a 
symbolic representation. Given a TA A with simple clock constraints W(A) = 
{(€1, 91), +++) (€p: Yp)} and W(Inv) = {(l1,91),---, (lg, %q)}, we introduce two 
sets X = {%1,..., £p} and Y = {y1,...,yq} of Boolean variables. Note that every 
valuation of the variables X U Y one-to-one maps to the reduction A<p,7s such 
that (e;, pi) € D iff x; is assigned True and (l,;,y,;) € I iff yj is assigned True. 
The set Z is gradually built during the whole computation of Algorithm 1. 
To represent Z, we build a Boolean formula I such that a reduction N does 
not belong to Z iff N does correspond to a model of I. Initially, Z = @, thus 


I = True. To add an insufficient reduction A<p,7s and all reductions N, N © 
Å<D,I>, to T, we add to I the clause (Viei pije (a ND zi) vV (Vig gjet nv) yj). 

The set C is repeatedly built during each call of the procedure findSeed 
based on Eq. 5 and it is encoded via a Boolean formula C such that every model 
of C does correspond to a reduction N €C : 


C=In A C V awi V y;)) ^ trues(|M| — 1) (6) 


A<D,I>EM_ (eip )EZ(A)\D (15.9; EW (Inv) M 


where trues(|M| — 1) is a cardinality encoding forcing that exactly |M|— 1 vari- 
ables from X UY are set to True. To check if C = Ø or to pick a reduction N € C, 
we ask a SAT solver for a model of C. To remove an insufficient reduction from 
C, we update the formula I (and thus also C) as described above. 


3.5 Related Work 


Although the concept of minimal sufficient reductions (MSRs) is novel in the 
context of timed automata, similar concepts appear in other areas of computer 
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Algorithm 3: findSeed(M, M,Z) 

1 while {NE R4|NELTAVM'EM.NE M'~A|N| =|M|—1} 4 Ø do 

2 N < pick from {NE Ra|NELTAVM'EM.NE M' A |N| =|M|-— 1} 
3 if N is sufficient then return N,T 

4 else T<Tu{N’eRa|N'C enlarge(N) } 


5 return null,7 


Algorithm 4: enlarge(A<p/+) 

1 foreach ce (W(A) u W(Inv))\(D v I) do 

2 if ce W(A) and Acputej,1> is sufficient then D — Du {c} 
3 if ce W(Inv) and Acp rfc} is sufficient then I — Iu {c} 


4 return Azp I> 


science. For example, see minimal unsatisfiable subsets [15], minimal correction 
subsets [47], minimal inconsistent subsets [16,18], or minimal inductive validity 
cores [32]. All these concepts can be generalized as minimal sets over monotone 
predicates (MSMPs) [48,49]. The input is a reference set R and a monotone 
predicate P : P(R) — {1,0}, and the goal is to find minimal subsets of R that 
satisfy the predicate. In the case of MSRs, the reference set is the set of all simple 
constraints W(A) UW(Inv) and, for every DUI € W(A)UW(Inv), the predicate is 
defined as P(D UTI) = 1 iff A< p,r> is sufficient. Many algorithms were proposed 
(e.g., [45,14,19,22,20,47,21,37,32,23]) for finding MSMPs for particular instances 
of the MSMP problem. However, the algorithms are dedicated to the particular 
instances and extensively exploit specific properties of the instances (such as we 
exploit reduction cores in case of MSRs). Consequently, the algorithms either 
cannot be used for finding MSRs, or they would be rather inefficient. 


4 Synthesis of Relaxation Parameters 


The main objective of this study is to make the target locations Lr of a given 
TA A = (L,lo, C, A, Inv) reachable by only modifying the constants of sim- 
ple constraints of A. In the previous section, we presented an efficient algo- 
rithm to find a set of simple clock constraints D C W(A) (1) (over transi- 
tions) and I & (Inv) (2) (over locations) such that the target set is reach- 
able when constraints D and I are removed from A. In other words, Lr is 
reachable on A<p,7>. Consequently, a verifier generates a finite run pp, = 
(lo, 0) dy (La, V1) ay ©» Sdn (In; Un) Of Aep,rs such that ln € Lr. Let Th = 


lo, e1, l1; +-+; €n—1, Ín be the corresponding path on A< p,1>, i-e., T}, is realizable 
on A<pD.I> due to the delay sequence do, d1,...,dn—-1 and the resulting run is 


U 


P'up- The corresponding path on the original TA A defined as in (4) is: 


Tir = M(Trr), and my, = lo, €1,h1,.--,€n—15 ln, (7) 
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While m}, is realizable on Acp,j>, TLy is not realizable on A since Ly is not 
reachable on A. We present an MILP based method to find a relaxation valuation 
r: DUI > Nu{o} such that the path induced by mz, is realizable on Acp rr. 

Given an automaton path 7 = lo, e1,h1,..-,€n—1,ln with e; = (li—1, Ai, Qi, li) 
for each i = 1,...,n—1, we introduce real valued delay variables ôo, ..., dn—1 that 
represent the time spent in each location along the path. Since clocks measure the 
time passed since their last resets, for a fixed path, a clock on a given constraint 
(invariant or guard) can be mapped to a sum of delay variables: 


I(x, r,i) = 0p +0n41+-..+6;-1 where k = max({m | x € Am, m < i}u{0}) (8) 


The value of clock x equals to I(x, m,i) on the i-th transition e; along 7. In (8), 
k is the index of the transition where x is last reset before e; along 7, and it is 
0 if it is not reset. (0, m,i) is defined as 0 for notational convenience. 

Guards. For transition e;, each simple constraint y = z — y ~ c E€ S(¢;) on 
the guard ¢; is mapped to the new delay variables as: 


T(x, m,i) _ I'(y, 7, 1%) OE + Pex, (9) 


where Pe; is a new integer valued relaxation variable if (e;, p) € D, otherwise 
it is set to 0. 

Invariants. Each clock constraint p = x — y ~ ce S(Inv(l;)) of the invariant 
of location l; is mapped to arriving (10) and leaving (11) constraints over the 
delay variables, since the invariant should be satisfied when arriving and leaving 
the location (and hence, due to the invariant convexity, also in the location). 


I(a,7,1) (a ¢ 4) —P(y, 7,1) Ify ¢ ài) ~ e+ Pury, if > O(arriving) (10) 
P(2,7,1+1)—I(y,7,t +1) ~ c+ pugi (leaving) (11) 


where I is a binary function mapping true to 1 and false to 0, pi,.o, is a new 
integer valued variable if (l;, p;) € I, otherwise it is set to 0. 

Finally, we define an MILP (12) for the path m. The constraint relaxation 
variables {p;, | (l, ¢) € I} and {pe | (e, p) € D} (integer valued), and the delay 


variables 69,...,5n,—1 (real valued) are the decision variables of the MILP. 
minimize D Pig + > Pe,p (12) 
(l,y)eL (e,p)eD 


subject to (9) for each i = 1,...,n — 1, and z — y ~ ce S(¢;) 
(10) for each i = 1,...,n, and a — y ~ ce S(Inv(I;)) 
(11) for each i = 0,...,n — 1, and z — y ~ ce S(Inv(l)) 
Pio E Z4 for each (l, p) e€ I, and pe € Z+ for each (e, p) € D 


Let {p} o | 0,2) € I}, {P} | (e, 9) € D}, and ô». --, 6,1 denote the solution of 
MILP (12). Define a relaxation valuation r with respect to the solution as 


r(l,y) = Pio for each (l,p)e I, r(e,p)= Pig for each (e, p) € D. (13) 
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Theorem 1. Let A = (L,lọ, C, A, Inv) be a timed automaton, 7 = Ip, e1,h1,.--, 
Cn, ln be a finite path of A, and D c W(A), I c WL) be guard and invariant 
constraint sets. If the MILP constructed from A, 7, D and I as defined in (12) 
is feasible, then ln is reachable on Acp r+ with r as defined in (13). 


Proof sketch Let {Py | l) € Tt, ie | (e,p) € D}, and 03,...,d%_, be 
the optimal solution of MILP (12). Define clock value sequence v9, U1, ...,Un 
with respect to the path m with e; = (li—1, Ai, Qi, li) and the delay sequence 
0$;---,0,_1 iteratively as v; = O and v; = (uj—1 + 67_,)[Ai := 0] for each 
i=1,...,n. Along the path z, v; is consistent with I'(-, m,i) (8) such that 


a) ui(x) = T(x, n, i) I(x € ài) and b) vilx) + 67 =L(a,7,i+1) (14) 


MILP (12) constraints and (14) imply that the path M(7) that end in ln is 
realizable on A<D,I,r> via the delay sequence 6j,...,67_4. 

A linear programming (LP) based approach was used in [27] to generate the 
optimal delay sequence for a given path of a weighted timed automata. In our 
case, the optimization problem is in MILP form since we find an integer valued 
relaxation valuation (r) in addition to the delay variables. 

Recall that we construct relaxation sets D and J via Algorithm 1, and define 
TLr (7) that reach Ly such that the corresponding path 7}, is realizable on 
A<p,r>. Then, we define MILP (12) with respect to 7z,, D and J, and define 
r (13) according to the optimal solution. Note that this MILP is always feasible 
since Th is realizable on A<p,r>. Finally, by Theorem 1, we conclude that Lr 
is reachable on Acp,rrs- 


Example 3. For the TA shown in Fig. 1, Algorithm 1 generates Acp,;s with D = 
{(es, re 25)} and I = {(Is, Ws 26)} such that 7 = lo, €1; l, €2, lo, €3, l, €4, l3, €5, 
l4 is realizable on A< p,r>. The MILP is constructed for 7, D and J with decision 
variables De. >25; Ply,u<26, 90,61, 62,63,64 and 05 as in (12). The solution is 
Des,x225 = 3, Plz,u<26 = 5, and the delay sequence is 9, 4,0,9,9,0. Consequently, 
l4 is reachable on Acp yrs with r(e5,x > 25) = 3 and r(l3,u < 26) = 5. 


5 Case Study 


We implemented the proposed reduction and relaxation methods in a tool called 
Tamus. We use UPPAAL for sufficiency checks and witness computation, and 
CBC solver from Or-tools library [50] for the MILP part. All experiments were 
run on a laptop with Intel i5 quad core processor at 2.5 GHz and 8 GB ram. The 
tool and used benchmarks are available at https://github.com/jar-ben/tamus. 
As discussed in Section 1, an alternative approach to solve our problem (Prob- 
lem 1) is to parameterize each simple clock constraint of the TA. Then, we can 
run a parameter synthesis tool on the parameterized TA to identify the set of 
all possible valuations of the parameters for which the TA satisfies the reacha- 
bility property. Subsequently, we can choose the valuations that assign non-zero 
values (i.e., relax) to the minimum number of parameters, and out of these, we 
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Table 1. Results for the scheduler TA, where || = |W(A) U W(J)| is the total number 
of constraints, d = |D u U| is the minimum MSR size, v is the number of reachability 
checks, t is the computation time in seconds (including the reachability checks), and 
Cm is the optimal cost of (12). 


Model ||W|/d} v | t |cm||Model |[W|/d] v | t Jem||Model ||W|}d] v t |cem 
Ag1123|11 |2133 10.18] 6 |Ag113 16 |3]120/0.63]10]| A¢r1123|19|3 | 120 | 0.63 |11 
A212317 |1| 13 0.1313 A 219| 24/1] 42 10.35113 Ar 212128| 1| 95 | 0.72 113 
Acs,1,18)| 16|3|61 10.371 9 | Ac1,18)123 [4]149|0.90] 16|| Ac 1.18) 28] 5 | 313 | 1.87 25 
Avs.2,18)| 24|1) 40 [0.40] 6 |A 21835 l1] 57 [0.58] 6 || Aco 18) [42] 1] 70 | 0.74] 6 
Acs,1,24)| 21 |4| 97 10.65112 Ac 1,24) | 31 [6|327| 2.16] 24||Ac7.1,24)| 38] 7 | 709 | 4.76 35 
A(s,2,24)| 32 |1] 80 [0.85] 16 | Acs 2 24 | 47 |2/169]1.80|31]].Ac7 2.24) 57] 2 | 201 | 2.21 |21 
Aa.1,30)| 26 (5|141|1.05] 15 | Acs,1,30)| 39 [7/541 /4.17] 31 || Acz.,30)| 48 |10|1680|14.12| 47 
1 2 


N 
a 
© 
w 
or 
© 
=n 
ja 
a 


A(s,2,30)|40|1] 65 |0.84| 9 |].A(s,2,30) | 59 |2|330/3.95] 14 ||Ac7,2,30)| 72 


can choose the one with a minimum cumulative change of timing constants. In 
our experimental evaluation, we evaluate a state-of-the-art parameter synthesis 
tool called Imitator [9] to run such analysis. Although Imitator is not tailored 
for our problem, it allows us to measure the relative scalability of our approach 
compared to a well-established synthesis technique. 


We used two collections of benchmarks: one is obtained from literature, and 
the other are crafted timed automata modeling a machine scheduling problem. 
All experiments were run using a time limit of 20 minutes per benchmark. 


Machine Scheduling A scheduler automaton is composed of a set of paths 
from location lo to location lı. Each path m = loeklkek+1 - - -lk+}M—1€k+mli rep- 
resents a particular scheduling scenario where an intermediate location, e.g. l; 
for i =k,...,k + M — 1, belongs to a unique path (only one incoming and one 
outgoing transition). Thus, a TA that has p paths with M intermediate locations 
in each path has M-p+2 locations and (M +1)-p transitions. Each intermediate 
location represents a machine operation, and periodic simple clock constraints 
are introduced to mimic the limitations on the corresponding durations. For 
example, assume that the total time to use machines represented by locations 
lki and lkpi+1 is upper (or lower) bounded by c for i = 0,2,...,M — 2. To 
capture such a constraint with a period of t = 2, a new clock x is introduced 
and it is reset and checked on every t” transition along the path, i.e., for every 
meéf{i-t+k |li-t < M—1}, let em = (lm, Am, m, lm+1), add x to Am, set 
om := bm A x < c (x È c for lower bound). A periodic constraint is denoted by 
(t,c,~), where t is its period, c is the timing constant, and ~ € {<, <, >, >}. 
A set of such constraints are defined for each path to capture possible restric- 
tions. In addition, a bound T on the total execution time is captured with the 
constraint x < T on transition eķ}m over a clock x that is not reset on any 
transition. A realizable path to lı represents a feasible scheduling scenario, thus 
the target set is Lr = {l1}. We have generated 24 test cases. A test case A/c» 11) 
represents a timed automaton with c € {3,5,7} clocks, and p € {1,2} paths 
with M e {12, 18, 24,30} intermediate locations in each path. Re, is the set of 
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Table 2. Experimental results for the benchmarks, where |W], d, v t and cm are as 
defined in Table 1, |W“| is the number of constraints considered in the analysis and m is 
the number of mutated constraints. t’, tT, t7° and t/7° are the Imitator computation 
times, where c indicates that the early termination flag (“counterexample”) is used, 
otherwise the largest set of parameters is searched, and T indicates that only the con- 
straints from the MSR identified by Tamus are parametrized, otherwise all constraints 
from W™ are parametrized. to shows that the timeout limit is reached (20 min.). We 
ran the Imitator with the flag “incl”. Note that when run with the flag ” merge”, the 
performance of Imitator increases on 2 benchmarks, however, it decreases on other 2 
benchmarks. 


Model Source] Spec. | [y] |v dm] v | t Jem] t |] t77 | ee [477 
accel1000  |[11][35]|reach.]7690] 13 [2| 3 | 22 | 1.83 | - /182.5] 2.08 | 1.77 | 1.03 
CAS [2] |reach.} 18 | 18 |2| 9 | 46 | 0.31 |16| 0.75 | 0.11 | 0.09 | 0.01 
coffee [12] |reach.} 10 | 10 |2} 3] 18 | 0.07 |14/0.008|0.002]0.007/0.003 
Jobshop4 [1] Jreach.} 64 | 48 [5] 5 [272| 1.99] - | to |949.5} to [942.3 
Pipeline3-3 | [41] |reach.} 41 | 41 |1|12| 42]0.37] - | to | 0.08} to | 0.05 
RCP [28] |reach.| 42 | 42 |1|11|181| 2.51 | - | to | 0.02 24.23] 0.02 
SIMOP3 [8] |reach.} 80 | 80 [6|40|903/10.65| - | to | 7.26] to | 0.49 
Fischer [36] [safety] 24 | 16 ]1/0]14/0.08]- |] to | to | 0.21 | 0.01 
JLR13-3tasks|[40][13]|safety] 42 | 36 [1/0] 40] 0.41] - | to | 2.60} 0.05 | 0.08 
WFAS __|[24][31||safety| 32 | 24 [1/0 | 10 | 0.08 | - [16.20] 0.01 | 0.03 [0.006 
periodic restrictions defined for the i” path of an automaton with c clocks: 


)} R32 = {(4, 17,2), (5, 20, <)} 
)} R52 = R32 U {(8, 33, >), (9,36, <)} 
)} R72 = R52 U {(12, 49, >), (12,52, <)} 


Note that A(..2,.7) emerges from Açe,1,m) by adding a path with restrictions Re,2. 

Table 1 shows results achieved by Tamus on these models. Tamus solved all 
models and the hardest one A,7,;,39) took only 14.12 seconds. As expected, the 
computation time t increases with the number |W| of simple clock constraints in 
the model. Moreover, the computation time highly correlates with the size d of 
the minimum MSR. Especially, if we compare two generic models A(,.1,,) and 
A(e,2,m); although A(¢.2,47) has one more path and more constraints, Tamus is 
faster on Açc,2,m) Since it quickly converges to the path with smaller MSRs. 

Imitator solved A(3,1,12); A(3,2,12)5 A(3,1,18)> and A(s,1,12) within 0.08, 0.5, 
61, and 67 seconds, and timeouted for the other models. In addition, we run 
Imitator with a flag “counterexample” that terminates the computation when a 
satisfying valuation is found. The use of this flag reduced the computation time 
for the aforementioned cases, and it allowed to solve two more models: A(3,9,18) 
and A(s,2,12). However, using this flag, Imitator often did not provide a solution 
that minimizes the number of relaxed simple clock constraints. 


Benchmarks from Literature We collected 10 example models from litera- 
ture that include models with a safety specification that requires avoiding a set 


306 J. Bendik et al. 


of locations L4, and models with a reachability specification with a set of target 
locations Lyr as considered in this paper. In both cases, the original models sat- 
isfy the given specification. For the first case, we define L4 as the target set and 
apply our method. Here, we find the minimal number of timing constants that 
should be changed to reach Ly, i.e., to violate the original safety specification. 
For the second case, inspired from mutation testing [2], we change a number of 
constraints on the original model so that Lr becomes unreachable. Eight of the 
examples are networks of TAs, and while a network of TAs can be represented 
as a single product TA and hence our approach can handle it, Tamus currently 
supports only MSR computation for networks of TA, but not MILP relaxation. 

The results are shown in Table 2. Tamus computed a minimum MSR for all 
the models and also provided the MILP relaxation for the non-network models. 
Note that the bottle-neck of our approach is the MSR computation and especially 
the verifier calls; the MILP part always took only few milliseconds (including 
models from Table 1), thus we believe that it would be also the case for the 
networks of TAs. The base variant of Imitator that computes the set of all 
satisfying parameter valuations solved only 4 of the 10 models. When run with 
the early termination flag, Imitator solved 3 more models, however, as discussed 
above, the provided solutions might not be optimal. We have also evaluated 
a combination of Tamus and Imitator. In particular, we first run Tamus to 
compute a minimum MSR A<p 7s, then parameterized the constraints D u I 
in the original TA A, and run Imitator on the parameterized TA. In this case, 
Imitator solved 9 out of 10 models. Moreover, we have the guarantee that we 
found the optimal solution: the MSR ensures that we relax the minimum number 
of simple clock constraints, and Imitator finds all satisfying parameterizations of 
the constraints hence also the one with minimum cumulative change of timing 
constants. 


Conclusion In this work, we proposed the novel concept of a minimum MSR 
for a TA, that is a minimum set of simple constraints that need to be relaxed 
to satisfy a reachability specification. We developed efficient methods to find 
a minimum MSR, and presented an MILP based solution to tune these con- 
straints. Our analysis on benchmarks showed that our tool Tamus can generate 
a minimum MSR within seconds even for large systems. In addition, we com- 
pared our results with Imitator and observed that Tamus scales much better. 
However, Tamus minimizes the cumulative change of the constraints from a min- 
imum MSR by considering a single witness path. If the goal is to find a minimal 
relaxation globally, i.e., w.r.t. all witness paths for the MSR, we recommend to 
use the combined version of Tamus and Imitator, i.e., first run Tamus to find a 
minimum MSR, parametrize each constraint from the MSR and run Imitator to 
find all satisfying parameter valuations, including the global optimum. 
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Abstract. We study semi-algorithms to synthesise the constraints un- 
der which a Parametric Timed Automaton satisfies some liveness re- 
quirement. The algorithms traverse a possibly infinite parametric zone 
graph, searching for accepting cycles. We provide new search and prun- 
ing algorithms, leading to successful termination for many examples. We 
demonstrate the success and efficiency of these algorithms on a bench- 
mark. We also illustrate parameter synthesis for the classical Bounded 
Retransmission Protocol. Finally, we introduce a new notion of complete- 
ness in the limit, to investigate if an algorithm enumerates all solutions. 


Keywords: Parameter Synthesis, Liveness Properties, IMITATOR 


1 Introduction 


Many critical devices and processes in our society are controlled by software, 
in which real-time aspects often play a crucial role. Timed Automata (TA [1]) 
are an important formalism to design and study real-time systems; they extend 
finite automata with real-valued clocks. Their success is based on the decidability 
of the basic analysis problems of checking reachability and liveness properties. 
Precise timing information is often unknown during the design phase. There- 
fore, Parametric Timed Automata (PTA [2]) extend TA with parameters, rep- 
resenting unknown waiting times, deadlines, network speed, etc. A single PTA 
represents an infinite class of TA. To facilitate design exploration, parameter 
constraint synthesis aims at a description of all parameter values for which the 
system meets some requirement. Unfortunately, it is already undecidable to check 
ifa PTA admits a parameter valuation for which a bad state can be reached [2,3]. 
In this paper, we study the parameter constraint synthesis problem for live- 
ness properties of the full class of PTA. In particular, the goal is to compute 
the parameter valuations for which a Parametric Timed Biichi Automaton has 
a non-empty language. Note that this allows handling requirements in LTL and 
MITL [24]. We represent the solution concisely as a disjunction of conjunctions 
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of linear inequalities between the parameters (a set of convex polyhedra). 

We will consider semi-algorithms that operate on the so-called parametric 

zone graph (PZG), where a parametric zone is a conjunction of linear inequalities 
over clock and parameter values. These semi-algorithms may not terminate since 
the PZG can be infinite. However, even in that case, we are interested in the 
soundness and completeness of the set of all enumerated solutions. 
Our contributions to the parameter constraint synthesis for liveness of PTA are: 
1) A definition of soundness and completeness for non-terminating algorithms. 
2) A new synthesis algorithm, using bounded search with iterative deepening; 
this is the first algorithm that enumerates all accepting cycles in the possibly 
infinite PZG, in contrast to previous NDFS-based algorithms [25]. 3) An experi- 
mental benchmark, comparing the successful termination and runtime efficiency 
of all algorithms. 4) A case study on the Bounded Retransmission Protocol. 


Related Work. Decidability for (subclasses of) PTA has been extensively stud- 
ied [2,19,3]. We study the emptiness and related synthesis problem for Paramet- 
ric Timed Büchi Automata with unrestricted use of rational parameters and 
real-valued clocks. In this general case, the model checking problem is undecid- 
able [2] and therefore exact synthesis is out of reach (in contrast to the setting 
with bounded integers [20,11]). Decidability of liveness properties for a subclass 
of PTA, where the occurrence of parameters is restricted, is discussed in [8]. 

Our approach inherits basic techniques from Timed Automata, in particular 
the zone graph. For TA, the zone graph is finite after LU-abstraction [27,23,17]. 
Another technique prunes states that are subsumed by larger states. Subsump- 
tion must be applied with care, in order to preserve liveness properties [22,18]. 

Previous semi-algorithms were based on Nested Depth-First Search (NDFS). 
They search the (possibly infinite) parametric zone graph (PZG) for accepting 
cycles. Their zones are projected onto the parameters and accumulated into the 
global constraint. The basic cumulative algorithm [11] prunes states whose pro- 
jected zone is already included in the accumulated constraint. The cumulative 
algorithm was extended with subsumption and layering for PTA [25]. The prob- 
lem with all NDFS-based algorithms is that the computation can diverge in one 
branch, missing solutions for accepting cycles in other branches forever. 

Our main improvement is a bounded approach, which can be combined with 
breadth- and depth-first search. We check for accepting cycles up to a certain 
bound, and keep increasing the bound to achieve completeness in the limit. 
Eventually, this will enumerate all parametric constraints corresponding to all 
accepting cycles in the PZG. Sometimes, the combination of bounded search 
and subsumption can even identify infinite paths that do not form a cycle, but 
this is not guaranteed. A previous proposal for Bounded Model Checking for 
PTA [21] considers the region graph and has not been implemented. We will 
provide several small illustrative examples inspired by the invited talk [26]. 

To evaluate our algorithms, we implemented them in the IMITATOR toolset 
[6], extending its functionality from reachability to liveness properties. This way, 
we can reuse its PTA benchmark [4]. We also reimplemented the algorithms 
of [11,25] in a single NDFS framework. We illustrate our method on the Bounded 


Iterative Bounded Synthesis for Efficient Cycle Detection 313 


Retransmission Protocol (BRP). We synthesize parameter constraints for live- 
ness properties of BRP for the first time. Our constraints are more liberal than 
the constraints reported in previous work [14,19]. 


2 PTA, Parametric Zone Graphs and Accepted Runs 


Let X be a set of real-valued clocks (e.g. x,y) and let P be a set of rational 
parameters (e.g. p,q). A linear term over parameters (plt) is an expression of 
the form 5°; a;p; + 8, where p; € P, and coefficients a;, 8 € Q. A (diagonal) 
inequality is of the form zı— gzz & plt, with a; E€ XU{0} andr € {<,<,=,>,>}. 
Examples are x — y < 2p +q, x > q—1 and 2 < p. A (convex) constraint (or 
zone Z) is a conjunction of inequalities. We write C for the set of zones. 


We define a PTA A = (L, lo, F, I, E), where L is bo 
a finite set of locations, fọ € L is the initial location 
and F C L is the set of accepting locations. I : L + C wet 
denotes an invariant for each location and E is a set of 


transitions of the form (4, g, R, ’), with source £ € L, 

target ¢’ € L, guard g € C and clock reset RC X. () TAE 
The concrete semantics of a PTA is defined in 

terms of valuations. A parameter valuation is a func- 

tion v : P + Qso and a clock valuation is a function Fig. 1. PTA A; 

w: X > Rso. Let d € R>o be a delay, then we define 

the clock valuation w + d such that (w + d)(x) := w(x) + d. Let R C X be 

a clock reset, then we define the clock valuation w[R](x) := 0 if x € R and 

w(x) otherwise. We write O for the clock valuation s.t. Vx € X : O(x) = 0. We 

extend parameter valuations to linear terms. We write v, w = (a; — x; œx plt) iff 

w(x) — w(z;) œx v(plt), and v, w H Z iff v, w |= e for all inequalities e in Z. 
Given a parameter valuation v, we write v(A) for the timed automaton ob- 

tained by replacing all parameters p in invariants and guards by v(p). The con- 

crete semantics of a PTA A is derived from the TA v(A), and defined as a 

timed transition system with states (¢,w), initial state (€o,0) (we assume that 

0 = I(£)), and transitions > = 4.4, where continuous time delay (5) and 

discrete transitions ($) are defined as 

— Tf d € Ryo and w + d | I(é), then (£, w) 4 (£, w+ d). 

— If e = (£, g, R, V) € E and w } g and w[R] H I(é’) then (£, w)  (’, w[R]). 
An infinite run (lo, wọ) > (4, w1) —> --- is accepted if it passes through an 
accepting location infinitely often, i.e. the set {i | 4; € F} is infinite. We ignore 
the problem of Zeno runs, which can be avoided by a syntactic transformation [9]. 


Example 1. The PTA A; in Fig. 1 has locations {40,41}, clocks {x,y} and pa- 
rameter p. Only 44 is accepting. The initial location @) has an invariant consisting 
of two inequalities. Its self-loop is enabled if x > 1 and it resets clock x. Note 


that clock y is never reset. For p = 2.5, we have the following example run: 


(lo, (0,0))= (Co, (1, 1)) > (Co, (0, 1))= (Œo, (1,2) > (a, (1,2)). 
Note that the accepting location 41 would not be reachable for p < 2. On the 
other hand, for all p > 2, there exists an infinite accepted run through 44. 
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We will now recall from [5,20] the parametric zone graph (PZG), providing 
an abstract semantics to a PTA. A single PZG treats all parameter valuations 
symbolically. Also, the PZG avoids the uncountably infinite timed transition 
system. The PZG can still be (countably) infinite. 

We first define some operations on zones, in terms of their valuations. It is 
well known that convex polyhedra are closed under these operations, and our 
implementation in IMITATOR uses the Parma Polyhedra Library [10]. 


— Time elapse: Z~ corresponds to {(v,w + d) | d € R>o Av, w = Z}. 
— Clock reset: Z[R] corresponds to { (v, w[R]) | v, w = Z}. 


The PZG is a transition system where each abstract state consists of a lo- 
cation and a non-empty zone. The PZG of A = (L, %,F,I, E) is (S, so, =, A), 
with S C L x C, initial state so = (lo, (Apex £ = 0) N I(£o)), and accepting 
states A = {(¢,Z) | € F}. A transition step (£, Z) => (¢’, Z’) exists if for some 
(£,g, R, V) € E we have Z' = ((Z O g)[R] NT(C))7 NIL) #0. We write >* 
(=*) for the transitive (reflexive) closure of >. 


Example 2. The PZG of A, from Ex. 1 is shown in Fig. 2; it extends infinitely 
to the right. We use that (x = 0A y = 0)~ = (y — x = 0). The loop on 9 can 
only be executed when x = 1, and it resets x := 0, while y is never reset. So after 
n executions of the loop, y — x = n. These n steps are only possible if p > n. 


The PZG obeys two important properties (Prop. 1 and 2). First, the para- 
metric constraint can only decrease along the transitions in the PZG. Second, 
a state simulates the behaviour of any state that it subsumes. We first define 
these notions. We write Z C Z’ iff v,w H Z implies v, w = Z”. 


— Parametric constraint: (£, Z)ļĻp corresponds to {v | 3w.v, w = Z}. 
— Subsumption: (4, Z) E (V, Z') if =V and ZC Z. 


Proposition 1 ([25]). If sı = sq then solp C silp. 


Proposition 2 ([25]). If sı = s2 and sı E s} then for some sh, s => sh and 
al 
S2 L So. 


Example 3. The first 4 state in Fig. 2 shows that there is an infinite loop when 
p > 2. By Prop. 1, the parametric zone of all states following the dashed red 
edge are contained in p > 2. So we can prune the PZG at the dashed red arrow, 
since no new parameter valuations will be found. 


æ<1 | gxi sxi | agzi 
S E < = 
Lo, gene Lo, yo ži [= = = Lo, o lo, ca A 


p20 p21 p22 p23 


| | 
Clas") Cla.) 


Fig. 2. PZG of the PTA A; from Fig. 1 
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Sı 82 53 
Ysp. c Ysp. Ysp. 
xr = 2 2 
los z—y$p [>] 0 2—y<ap | >] Eo ey Sap [> 
p25 p25 p>5 
(a) PTA A2 (b) Its PZG with an infinite accepted run, but no loop 


Fig. 3. PTA A» with the corresponding PZG 


Example 4. Fig. 3 shows PTA Aə and its infinite PZG. The transition can only 
become enabled when p > 5. Each transition must happen within the following 
p time units, so after n > 0 iterations, 5 < x— y < n x p. Note that sı = s2 and 
sı E s2. By Prop. 2, for some s’, s2 > s’ and s2 CE s’. Repeating the argument, we 
can construct an infinite trace. So, although the PZG has no cycle, the presence 
of an infinite path can be deduced even if we prune the PZG at the dashed edge. 


3 Sound and Complete Liveness Parameter Synthesis 


Given a PTA A, we aim at synthesising the parameter valuations v for which the 
TA v(A) contains an infinite accepted run. Our algorithms operate by searching 
the PZG (S,s9,=,A) for accepting “lassos” or, as in Ex. 4, 6 and 7, even for 
accepting “spirals”. We write =" (=*) for the transitive (reflexive) closure of >. 
An accepting lasso on sı consists of two finite paths sọ >* s1 >* sı, such that 
sı E€ A. More generally, an accepting spiral on sı consists of two finite paths 
So >* sı >t s2, with sı € A and sı E s2. 


Proposition 3. If the PZG of PTA A contains an accepting spiral on sı, then 
for all v € sıļp, v(A) contains an (infinite) accepted run. 


Proof. Assume so >* sı >* sg with sı € A and sı E s2. Note that s2 € A, 
since C only holds between states with the same location. Then by monotonicity, 
sıļpE s2ļp and by Prop. 1, s2ĻpE sıļp, so sıļp= slp. By Prop. 2, there 
exists some s3 such that s2 => s3 and s2 E s3. We can repeat this to construct 
an infinite accepted run from sı, with the constant parametric constraint sı} p. 
The states from sọ =* sı have an even larger constraint (Prop. 1). By the 
correspondence between runs in the PTA and runs in the PZG, we obtain an 
infinite accepted run in v(A) for every v F sıļp. 


The reverse of Prop. 3 is not true. An infinite PZG could contain an infinite 
path that does not form a lasso (or even a spiral). Such an infinite path in the 
PZG may or may not correspond to a concrete TA run. 


Example 5. The situation of As in Fig. 4 is quite different 
from Ex. 4. The PZG of A has an infinite path (Zo, Zi), 
where Z; contains the invariant x < 1A y < p and the 
additional constraints y— x = i ^p > i. Note that at most p 
transitions can happen in A3, since we cannot wait longer 
when y > p. So v( A3) has only finite runs for any v. We call 
this infinite path infeasible, since N;(Z;ļ p) = 0. 


Fig. 4. PTA As. 
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3.1 Soundness and Completeness 


In contrast to TA, where both reachability and liveness properties are decid- 
able [1], it is well-known that even reachability-emptiness for PTA is undecid- 
able [2,3]. So in particular, we cannot expect a terminating, sound and complete 
algorithm for liveness synthesis. Instead, our algorithms are semi-algorithms, 
which enumerate a number of aggregate solutions, but may not terminate. Each 
aggregate solution will be presented as a convex polyhedral constraint on the 
parameters (“parametric zone” ). 

Such semi-algorithms can either enumerate a finite num- 
ber of aggregate solutions (after which they could termi- 
nate or diverge), or enumerate an infinite number of aggre- 
gates (and hence never terminate). Fig. 5 shows an example 
where the set of solutions, p € {1,2,3,...}, is not equivalent 
to a finite disjunction of convex polyhedra, so no terminat- 
ing algorithm can enumerate all aggregate solutions.! Fig. 5. PTA A4 


In the rest of this section, we introduce and discuss various soundness and 
completeness requirements for semi-algorithms. Assume that the algorithm is 
run on an input PTA A and let Sol be the set of all solutions, i.e. Sol = {v | 
v( A) has an accepted run}. Assume that the algorithm enumerates a finite or 
infinite collection of aggregate solutions, in the form of parametric zones Z;. 


Partial correctness: This traditional correctness criterion requires that if the 
algorithm terminates, then |J; Z; = Sol, i.e. the finite output characterizes ex- 
actly all correct parameter valuations. 


Soundness: This criterion also provides some guarantee when the algorithm 
diverges. It requires that all enumerated solutions are correct, i.e. |J; Zi C Sol. 


Completeness: We call a semi-algorithm complete if it enumerates all solu- 
tions, i.e. Sol C |J; Zi. Enumerating p = 1, p = 2, ... is complete for A4. 

Note that for reachability, a simple Breadth-First Search (BFS) over the PZG 
would yield a sound and complete (but not always terminating) semi-algorithm. 
For liveness, this is insufficient: the algorithm would miss infinite paths that do 
not form a cycle. Still, the following trivial semi-algorithm, EnumQ, would be 
sound and complete: “Enumerate all rational parameter valuations v, decide if 
v(A) has an accepting loop [1] and, if so, emit {v}.” Although it is sound and 
complete, this algorithm is quite unsatisfactory, since it will never terminate, and 
it will never aggregate solutions in larger polyhedra. To distinguish PZG-based 
algorithms, we need a weaker form of completeness. 


Completeness for symbolic lassos: A semi-algorithm is complete for symbolic 
lassos if it enumerates all parameter valuations leading to accepting lassos in the 
PZG, i.e. |]; Zi contains s|p, when the PZG contains an accepting lasso on s. 

Completeness for symbolic lassos is weaker than completeness, since it may 
miss parameter valuations v for which v( A) has an accepted run, but this only 
happens when the PZG has an infinite path that does not end in a cycle. 


1Tt is not even obvious that N:i(ZiļpP) can be represented by a finite conjunction. 
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4 Semi-Algorithms for Liveness Parameter Synthesis 


In this section, we discuss three semi-algorithms for liveness parameter synthesis. 
In Sec. 4.1, we discuss the previous approach [11,25], based on Nested Depth- 
First Search (NDFS). All NDFS-based variants turn out to be incomplete for 
symbolic lassos. In Sec. 4.2, we introduce a simple algorithm based on Breadth- 
First Search (BFS), which analyses the Strongly Connected Components (SCC) 
at each new level. We show that the BFS-based algorithm is complete for sym- 
bolic lassos. Finally, Sec. 4.3 introduces our new Bounded Synthesis with Iter- 
ative Deepening (BSID) algorithm. BSID is also complete for symbolic lassos, 
and it is compatible with all NDFS enhancements. 


4.1 Nested Depth-First Search with Enhancements 


The NDFS algorithm (Alg. 1) is run on the PZG, with initial state so, accepting 
states A, and NEXT-STATE(s) enumerating the =-successors. We first explain 
basic NDFS [13], cf. the uncoloured parts of Alg. 1. The goal of the outer blue 
search (1l.4-13) is to visit all states in DFS order, and just before backtracking, 
call the red search on all accepting states (1.12). Note that states on the DFS 
stack are cyan (1.6), and states that are handled completely are blue (1.13). The 
goal of the inner red search (11.14-21) is to detect if there is an accepting cycle. 
It colours visited states red (1.16), to ensure that states are visited at most once. 
It reports an accepting cycle (1.20) when a cyan state is encountered. 


Cumulative pruning (pink) [11,25]. For synthesis, we collect the Constraints 
that lead to accepting cycles (1.20). We prune the search when the parametric 
constraint of some state is included in Constraints (1.5,15). This is justified 
by Prop. 1, since all successors of the pruned state will have an even smaller 
parametric constraint. Prop. 1 also implies that all states on a cycle have the 
same parametric constraint. So we also prune the red search, by restricting the 
search for a cycle to the current parametric constraint (1.18). 


Subsumption (E) [22,25]. This pruning strategy takes advantage of the 
subsumption relation between states. The accepting lassos reachable from red 
states s are already included in Constraints. By Prop. 3, any lasso on state 
t E t can be simulated by t’. Hence, we immediately prune the search when 
we encounter a state t C Red, i.e. Ht’.t E t € Red (1.11,21). We exploit the 
subsumption structure once more: if t J Cyan, i.e. dt’.t Jt’ € Cyan (1.19), we 
have found an accepting spiral, which implies there is an accepted run, Prop. 3. 


Lookahead (yellow) . The lookahead strategy is new (in this context) and 
allows for early detection of accepting cycles in dfsBlue. It looks for a transition 
to a cyan state (1.7), which is on the DFS stack. If the source or target of this 
transition is accepting, then the cycle is accepting as well and reported at 1.8. 


Accepting F irst ORO]. This is a new strategy, aimed at increasing the chance 
of finding an accepting cycle early in the search, to promote more pruning. It 


simply works by picking accepting successors before their siblings at 1.10,17. 
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Alg. 1 Collecting NDFs with strategies: 
1: procedure NDFS 
2: Cyan := Blue := Red := Ø; Constraints := 0 


3: dfsBlue (so) 

4: procedure dfsBlue(s) 

5: if s|p C Constraints then Blue := Blue U {s} ; return 
6: Cyan := Cyan U {s} 

T: it sE 

8: 01 

9: else 

10: for all t c EAA EXT-STATE(s) do 

11: if t g Blue U Cyan WAV) then dfsBlue(t) 
12: if s € A then dfsRed(s) 


13: Blue := Blue U {s}; Cyan := Cyan \ {s} 


14: procedure dfsRed(s) 
15: if s{p Z Constraints then 


16: Red := Red U{s} 

17: for all ¢t c INANOSNENEXT-sTATE(s) do 

18: if Ups s} P then 

19: if then 

20: Constraints := Constraints U tL p > Report cycle at state t 
21: else if then dfsRed (t) 


Layering (not shown here) [25]. The layering strategy gives priority to states 
with large parametric constraints, since these potentially prune many other 
states. To this end, successors in the next parametric layer are delayed, which is 
sound, since every cycle must lie entirely in the same parametric layer (Prop. 1). 


Proposition 4. All mentioned NDFS variants are sound and partially correct. 


Proof. Partial correctness is shown in [25]. Soundness follows from Prop. 3, since 
all collected constraints correspond to accepting spirals. o 


Example 6. None of the mentioned NDFS is complete for 
symbolic lassos. Consider As in Fig. 6. Its PZG extends 
Fig. 3(b) with a transition from all states to one additional 
accepting state with self-loop, s = (41,p + x > y > 6+ x), 
where s| p= (p > 6). All NDFS variants (including all com- 
binations of cumulative pruning, subsumption, lookahead, 
accept-first, and layering) allow the execution that diverges 
on the infinite p > 5 path, so they will never detect the ac- Fig. 6. PTA As 
cepting cycle on p > 6. 
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4.2 Breadth-First Search 


We now describe a BFS-based synthesis algorithm for accepting cycle detection. 
As in Alg. 1, our BFS algorithm maintains a parameter constraint Constraints, 
initially empty. The algorithm basically explores the newly computed symbolic 
states in a breadth-first search manner, i.e. by iteratively computing all sib- 
lings at a given depth level, before computing their own children states. Then, 
whenever one of these new states is identical to a state already present in the 
state space, a cycle may exist. In this case, we run an SCC-detection algorithm 
(inspired by Tarjan) and, if there is indeed a cycle, we add the cycle parameter 
constraint to the result Constraints. Remember that, from Prop. 1, all states in 
such a cycle have the same parametric constraint. 

Note that, in contrast to the algorithms in Sec. 4.1 and 4.3, we have to use 
state equality, since using unrestricted subsumption could introduce spurious cy- 
cles (cf. examples in [22]). However, we do use cumulative pruning, as in Sec. 4.1: 
whenever the parametric constraint of a new state s is included in the current 
result Constraints (i.e. s}p C Constraints), we discard it, as no potential loop 
starting from this state, or from its successors, can improve Constraints anyhow. 

In contrast to the NDFS-based algorithms in Sec. 4.1, our BFS algorithm is 
complete for symbolic lassos, since every lasso will appear at some level, and the 
SCC algorithm will eventually detect it. 


Proposition 5. The BFS+SCC algorithm is sound, partially correct, and com- 
plete for symbolic lassos. 


4.3 Bounded Synthesis with Iterative Deepening 


One way to enforce termination is to explore the PZG up to a given depth 
(Bounded Synthesis). However, this could make the result incomplete. Therefore, 
as long as there are unexplored states, the bound should be increased (Iterative 
Deepening), to synthesize parameter valuations for deeper accepting cycles. 

Alg. 2 presents this procedure, called BSID. Although all strategies in Sec. 4.1 
are compatible with this approach, only cumulative pruning and subsumption 
are shown in the algorithm. It repeatedly explores the PZG from an initial 
depth depthinit, incrementing the depth by depthstep at each iteration (1.8). 
The termination criterion is that the current exploration did terminate without 
reaching its current depth (1.7). In this case, the result is complete. Both dfsBlue 
and dfsRed do not go beyond the current exploration depth (at 1.10,20). 

To avoid some duplicate work at different iterations, the set of blue states is 
split using two colours: Green states have a descendent not completely processed 
due to the depth limit, and should thus be considered in further iterations; Blue 
states are those whose children have already been completely explored and thus 
should not be considered anymore. Hence, at the beginning of an iteration, all 
colours but blue are reset (1.5). States are coloured green when they are at the 
depth limit (1.10) or if they have a green successor (1.16). Note that dfsBlue is 
not called for blue states at 1.14, but it may be called for states that have been 
coloured green at the previous iteration but have been uncoloured. 
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Proposition 6. The BSID algorithm is sound, partially correct, and complete 
for symbolic lassos. 


Proof. Soundness follows from Prop. 3, since every collected constraint corre- 
sponds to an accepting spiral. Completeness for symbolic lassos follows, since 
every accepting cycle in the PZG is entirely present at some depth. When NDFS 
is run beyond that depth, it will report the constraint leading to that cycle. Par- 
tial correctness follows, since the algorithm only terminates if the last run did 
not reach the depth-bound, in which case the PZG is searched exhaustively. O 


Example 7. On both Ag (Fig. 3, Ex. 4) and As (Fig. 6, Ex. 6), BSID will correctly 
report p > 5 and then terminate; for As it may first report p > 6, depending 
on the search order. It is actually the combination of bounded synthesis and 
subsumption that makes the algorithm complete for this example. The bound 
ensures that NDFS is run after the first iteration, and subsumption ensures that 
an accepting spiral is found as explained in Ex. 4. At this point, the constraint 
p = 5 is discovered, which prunes the rest of the PZG, ensuring termination. 


Alg. 2 Iterative deepening NDFS with cumulative constraint pruning and subsumption 


1: procedure ITERATIVECOLLECTNDFSSuB(depthinit, depthstep) 
Cyan := Blue := Red := Grem := 0 ; Constraints := 0 


: procedure dfsBlue (spag) 


if s|p C Constraints then Blue := Blue U {s} ; return 
Cyan := Cyan U {s} 
for all £ € NEXT-STATE(s) do 

if t g Blue U (GREW U Cyan ^t Z Red then dfsBlue (t{a@2l) 
if s € A then dfsRed (sd) 


ee os 


17: else Blue := Blue U {s} 
Cyan := Cyan \ {s} 


19: procedure dfsRed (sfa) 


if GQ WEP) slp Z Constraints then 


2i; Red := Red U {s} 


9 


w 
2 


22; for all t € NEXT-STATE(s) do 

23: if t} p= sļp then 

24: if Cyan Ct then 

25: Constraints := Constraints Ut) p > Report cycle at state t 
26: else if t Z Red then dfsRed (tami) 
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5 Experimental Evaluation 


We conducted some experiments, to compare all algorithms on the number of 
cases they can solve and on their efficiency. In order to compare cases in which 
an algorithm does not terminate, we also counted the number of reported cycles. 

To this end, we implemented our new algorithms BFS and BSID in IMI- 
TATOR 3,” and we also reimplemented all NDFS-based algorithms [11,25] in 
a unified DFS framework. We ran all algorithms on a benchmark, distributed 
with IMITATOR [4] and also used in [25]. The size of the benchmarks is shown 
in Tab. 1 (columns L,X,P). We used a timeout of 120 s.° 

In Tab. 1, we compare some combinations of NDFS enhancements (Sec. 4. 1), 
extending the baseline (cumulative TERED). The results show that 
alone performs worst, while lookahead solves more cases, e.g. 11.3—6 of Tab. 1. In- 
terestingly, adding our new strategy succeeds to find cycles (1.12) 
that are missed by all other EAEE Finally, adding the layering approach 
leads to success in most cases and provides the fastest results on average, but it 
finds no accepting cycles at all for five cases where others found some. 

Tab. 2 compares the new algorithms BFS (Sec. 4.2) and BSID (Sec. 4.3), 
including all enhancements (except layering) under various depth settings. BSID 
is generally faster than BFS, in particular with an iterative depth-step of 5. The 
performance of BFS is closest to BSID with depth-step 1. The first two columns 
evaluate the effectiveness of using the green colour (ng = -no-green). Without 
green, no information from previous iterations is reused. Avoiding recomputation 
is faster, leading to a deeper exploration within the time limit (e.g. on 1.2). 

Comparing both tables, we notice that for 11.15-17 NDFS synthesised some 
parameter values that are missed by BSID and BFS. BSID is generally faster 
than its NDFS counterpart A+L+Sub, but NDFS with layering is even faster. 


6 Case Study: the Bounded Retransmission Protocol 


The Bounded Retransmission Protocol (BRP) has been analysed in [16,14,19], 
but we now synthesise the most liberal parameter constraints to obtain some 
reachability and liveness guarantees. For reachability, these constraints are more 
liberal than proposed in previous work. Synthesising parameter constraints for 
liveness properties is new, and our new algorithms were required to achieve this. 

Our starting point is the PTA model from [14]. Each session starts with a 
transmission request S_in and is terminated by an indication S_ok, S_nok or S_dk 
(“don’t know”). The BRP is regulated by clocks, with some timing parameters: 
TD is the delay in the communication channel, TS and TR indicate the time that 
the sender (receiver) should wait. Finally, SYNC models the waiting time in case 
sender and receiver get out of sync. The maximum number of retransmissions is 
a discrete parameter, which we fixed in most experiments to MAX = 2. 


? Algorithms are integrated in IMITATOR v.3. The artifact is at doi.org/10.5281/ 
zenodo.4115919 and can be run at: imitator.lipn.univ-paris13.fr/artifact. 

3The experiment ran on a DELL PowerEdge FC640, 2 processors (Intel Xeon Silver 
4114 @ 2.20 GHz), Debian GNU/Linux 10, 187.50 GiB memory. 
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6.1 Synthesis for Reachability Properties: deriving sharper bounds 


To illustrate synthesis for reachability properties, we first enhance the parametric 
verification experiments from [14,19] in IMITATOR. The reachability properties 
are: (C) the channels will never be used simultaneously; and (R) the receiver 
gets a correct initial frame in each session. Property (C) is formalised as: 


property := #synth AGnot(loc[channelK] = in_transitK & loc[channelL] = in_transitL) 


We synthesise the safe parameter constraints for “unreachability” by:+ 


imitator -mergeq -comparison inclusion brp_Channels.imi brp_Channels.imiprop 


IMITATOR derives within 2s the exact constraint TS > 2*TD: The sender should 
wait (TS) for the round-trip time of a message + acknowledgement (2*TD). 
Property (R) is formalised by adding an error location FailureR to the 
receiver, which should be unreachable. Since we learned the constraint TS>2*TD 
in the previous run, we now include this constraint in the initial condition. 
Within 1s, IMITATOR synthesizes the exact constraint for this safety property: 


imitator -mergeq -comparison inclusion brp_RC.imi brp_RC.imiprop 
SYNC + TS >= TR + TD & TS > 24TD & TR > 4eTS + 3xTD 


The fact that this can be computed is not surprising, but it is surprising that 
this constraint is more liberal than the one derived in [14,19], which was: 


SYNC >= TR & TS > 2x TD & TR > 2MAX#TS + 3*TD 


One can easily check that, for MAX = 2, their constraint is strictly stronger 
than ours. So we found more parameter values for which BRP is correct. By con- 
struction, we found the most liberal constraint for MAX = 2, and we confirmed 
a similar result for up to MAX = 20. We cannot handle a parametric MAX. 


6.2 Liveness: approximations by bounded synthesis 


Next, we want to measure the overhead of liveness checking. To this end, we 
make the failureR location an accepting cycle, and use a liveness property. Note 
that in this case, the synthesised constraint will indicate the error condition. 


accepting loc FailureR: invariant True when True goto FailureR; 
ie =... % VS S 2a TD 
property := #synth CycleT hrough(accepting) 


Since we search for an accepting loop, inclusion and merging are unsound, but 
still complete. However, we can safely apply subsumption in NDFS. Without 
inclusion, the zone graph is infinite, so we are forced to resort to bounded syn- 
thesis, which only provides an under-approximation. Hence, we also use iterative 
deepening (BSID, Sec. 4.3). The depth limit is reached in 6s. 


“Inclusion and merging are sound and complete for reachability [7]. Inclusion applies 
maximal subsumption, while merging combines zones with exact convex hull. 


Iterative Bounded Synthesis for Efficient Cycle Detection 325 


imitator brp_RC.imi accepting.imiprop -depth-step=5 -depth-limit=25 -recompute-green 
4xTS + 3*TD >= TR & TS > 2«TD 
OR TR + TD > SYNC + TS & TS > 2«TD 


We could have searched even deeper for more liberal constraints, but it can 
be easily checked that this error constraint is equivalent to the complement of 
the safety constraint (within the initial condition), see Sec. 6.1, property (R). 
Hence, we can conclude that we have already synthesised the exact constraint. 


6.3 Proper Liveness Properties 


GF(S_in). Next, we will synthesise constraints for an actual liveness property, 
stating that the number of new sessions is infinite. We use Spot [15] to generate 
a Biichi automaton for the negation of this formula, and add the result as a 
monitor to the IMITATOR model, synchronising with the sender process. We add 
the constraints on correctness that we learned before to the initial constraints: 


init := ... & SYNC >= TR & TS > 24TD & TR > 4*TS + 3*TD 


The following command tries to synthesize all parameters (within the initial 
constraint) for which an accepting loop is reachable, i.e. GF S_in is violated. We 
replaced subsumption by full inclusion, since otherwise IMITATOR gets lost in 
the infinite parametric zone graph. Recall that inclusion is complete but unsound 
for NDFS, so this provides an over-approximation of the constraints. 


imitator -no-subsumption -comparison inclusion brp_GF_S_in_RC.imi accepting.imiprop 


IMITATOR replies False in 1 second, so there is no reachable accepting cycle. 
Since this was an over-approximation, the result is conclusive: GF S_in holds 
under all parameter values inside this initial constraint. Note that, in principle, 
the property could be violated outside this initial condition. We can rerun the 
same experiment with the more general initial condition TS > 2*TD. IMITATOR 
confirms that the property still holds, but checking this larger space takes 19s. 


G(S_in > F(S_ok V S_nok V S_dk)). Using the same method, IMITATOR con- 
firms in 16s, that also this response property holds: every sessions start is fol- 
lowed by some indication. 


imitator -no-subsumption -comparison inclusion brp_GSinFSdk.imi accepting.imiprop 


G(S_in => F(S_ok V S_nok)). Let us pretend that we forgot the indication S_dk 
(don’t know). This time, we search for a symbolic counter-example (using the 
option -witness), under the initial condition TS > 2*TD. 


property := #witness CycleThrough(accepting) 
imitator brp_GSinFSnok.imi accepting_one.imiprop 


As expected, IMITATOR finds a counter-example quickly (within 0.04s). 
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7 Conclusion 


We presented and evaluated new semi-algorithms solving the liveness parame- 
ter synthesis problem for Parametric Timed Automata. We also introduced new 
soundness and completeness notions for such semi-algorithms. The new algo- 
rithms, based on BFS and Bounded Synthesis (BSID), at least enumerate all 
parameters leading to accepting lassos in the parametric zone graph. We showed 
that this property does not hold for all previous algorithms, which were based on 
NDFS. Our new algorithms are less sensitive to the particular search order than 
the previous NDFS algorithms, that could get stuck in some branch of the PZG. 

Tab. 3 (left) shows the soundness and completeness status of all considered al- 
gorithms. Full inclusion and BS-n can only provide an over-approximation (resp. 
under-approximation). The enumQ algorithm is complete, but never terminates 
(indicated by x x), so its partial soundness and completeness results are vacuous 
(indicated by (“)). Although the problem is undecidable, one might still hope 
for an algorithm that enumerates all possible solutions (like enumQ, generating 
and testing all rational solutions) and produces a finite set of aggregate solutions 
(if it exists). The algorithm should terminate for practical cases. 

Tab. 3 (right) shows the results of our algorithms for examples A,—Ag. They 
either terminate with an exact (v) or partial ((/)) result, or diverge (x). In one 
case the addition of the layers strategy is needed to obtain a partial result ((L)). 

Our last example shows another challenge to obtain a 
complete approach. The PZG of PTA Ag has a non-cyclic 
infinite path. It seems non-trivial to compute its limit con- 
straint automatically. After n steps, the parametric con- 
straint is p > n x q. So the limit constraint is q = O0Ap> q. 

In order to handle cases where the set of solutions is 
not even a finite union of convex sets (Fig. 5), an entirely Fig. 7. PTA As. 
different representation of the solutions would be required. 

Finally, exploiting the component-based structure of networks of PTA using 
a compositional approach, such as the one developed recently for fair paths in 
infinite systems [12], would be an exciting extension. 


Table 3. Soundness and completeness properties of various algorithms. 


N 

Q 

ele _le gle alg als 3 

‘|S US aly sla gaa 

SIS gs g a g=ig = 

Sis als 8/5 S15 ‘als 5 
Algorithm £la 8a 3/8 So 2/5 & A3 As As Ae 
NDFS (enhanced) || x x || x x x 
NDFS + inclusion || x x x x x 
BFS + SCC x 3 x x 
BSID x Z Xx x 
BS-n (fixed bound) x 
Naive enumQ 
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Abstract. We investigate efficient algorithms for the online monitor- 
ing of properties written in metric temporal logic (MTL). We employ 
an abstract algebraic semantics based on semirings. It encompasses the 
Boolean semantics and a quantitative semantics capturing the robust- 
ness of satisfaction, which is based on the max-min semiring over the 
extended real numbers. We provide a precise equational characterization 
of the class of semirings for which our semantics can be viewed as an ap- 
proximation to an alternative semantics that quantifies the distance of a 
system trace from the set of all traces that satisfy the desired property. 


Keywords: Online Monitoring - Verification - Quantitative Semantics. 


1 Introduction 


Online monitoring is a lightweight verification technique for checking during run- 
time that a system behaves as desired. It has proved to be effective for evaluating 
the correctness of the behavior of complex systems, which includes cyber-physical 
systems (CPSs) that consist of both computational and physical processes. An 
online monitor is a program that observes the execution trace of the system and 
emits values that indicate events of interest or other actionable information. 

It is common to specify monitors using special-purpose formalisms such as 
variants of temporal logic and domain-specific programming languages. In the 
context of cyber-physical systems, logics that are interpreted over signals are 
frequently used. This includes Metric Temporal Logic (MTL) [80] and Signal 
Temporal Logic (STL) [33]. We focus here on properties specified with MTL 
and interpreted over discrete-time signals. We do not restrict the outputs of the 
monitor to Boolean (qualitative) verdicts, but allow for a quantitative interpre- 
tation of property satisfaction that admits various degrees of truth or falsity. 
Such quantitative interpretations of temporal logic have been considered before, 
including several variants of the so-called robust semantics of MTL [22[20]5]. 

Our starting point is the widely-used spatial robust semantics of MTL [22]. 
This uses the set R¥® = R U {—o0, oo} of the extended real numbers as truth 
values, where a positive number indicates truth, a negative number indicates 
falsity, and zero is ambiguous. Disjunction is interpreted as max, and conjunction 
is interpreted as min. Two quantitative semantic notions are considered in [22]: 
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(1) the robustness degree degree(y,u) of a trace u w.r.t. a formula p, which 
is defined in a global way using distances between signals, and (2) the robust 
semantics p(y, u) of a formula y w.r.t. a trace u, which is defined by induction 
on the structure of y. The former notion is the primary definition that captures 
the intuitive idea of the degree of satisfaction, whereas the latter is used as an 
approximate estimate. The usefulness of this estimate is justified by establishing 
a precise relationship between the two values [22]. The robust semantics of [22] 
has been used in prior work on online monitoring [16J15). 

We embark on an investigation of how to generalize the robustness frame- 
work of to other notions of quantitative truth values. Instead of focusing 
exclusively on the concrete structure (R*®, sup, inf, —co,00), we take an ab- 
stract algebraic approach and look at classes of structures that are defined 
axiomatically. We start by considering the class of semirings, algebraic struc- 
tures of the form (V,+,-,0,1) with an addition operation + (which models 
disjunction) and a multiplication operation - (which models conjunction) sat- 
isfying a set of equational laws. The class of semirings contains B = {1,T} 
(the Boolean values), (R=, max, min, —0o, 00), the max-plus (tropical) semir- 
ing (RU{—oo}, max, +, —o0, 0), and (R, +,-, 0,1). The semiring of intervals with 
(semiring) addition given by [a, b] @ [c, d] = [max(a, c), max(b, d)] and (semiring) 
multiplication given by [a,b] ® [c,d] = [min(a,c), min(b, d)] is an especially in- 
teresting example, as it can be used to model uncertainty in the truth value: an 
element [a,b] indicates that the truth value lies somewhere within this interval. 

We use an algebraic generalization of the inductively-defined robust seman- 
tics of [22], as our goal is to obtain online monitors that are time- and space- 
efficient. Our main results are the following: 

— The theorem of that relates degree(y, u) and p(y, u) is generalized from 
R= to a class of semirings. The class of semirings for which the theorem 
holds admits a precise axiomatic characterization (Theorem [7). To obtain 
this, we develop a notion of symbolic quantitative languages that forms a 
semantic bridge between quantitative specifications and sets of traces. 

— We propose a new algorithm for efficient online monitoring (Theorem 
that goes beyond existing algorithms. Prior monitors [16[15] compute max 
or min over sliding-windows and therefore apply only to semirings that are 
linear orders (e.g., B and R*~). Our monitoring algorithm applies to values 
V that are partial orders or more general semirings. In order to obtain this 
algorithm, we reduce the monitoring of formulas of the form Y Sjq.9j Y and 
P Ulan] Y to a sliding-window aggregation (which is neither max nor min). 

We provide an implementation of our algebraic monitoring framework in Rust. 
Our experiments show that our monitors scale reasonably well and they compare 
favorably against the state-of-the-art monitoring tool Reelay [40]. 


2 Algebraic Semantics using Semirings 


A semiring is an algebraic structure (V, +,-,0,1), where + is called addition and 
- is called multiplication, that satisfies the following properties: (1) (V,+,0) is a 
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commutative monoid, (2) (V,-, 1) is a monoid, (3) multiplication distributes over 
addition, and (4) 0 is an annihilator for multiplication. The last two properties 
say that x(yt+z) = ryt+az, (x+y)z = xz+yz, and 0x = 20 = 0 forall x,y,z € V. 
We sometimes write xy to mean x-y. A semiring V is called idempotent if addition 
is idempotent, that is, x + x = x for every x € V. For an idempotent semiring, 
we define the partial order induced by + as follows: x < y if z +y = y. A 
homomorphism from a semiring U to a semiring V is a function h : U > V 
that commutes with the semiring operations. An epimorphism is a surjective 
homomorphism. Let U and V be idempotent semirings and h : U + V bea 
semiring homomorphism. Then, h is monotone (i.e., order-preserving). 


Example 1. The set B = {1, T} of Boolean values with disjunction and conjuc- 
tion is a semiring. The set T = {1,?, T} can be endowed with semiring structure 
as follows: r+ L =z, £x +T =T,?+?=?,x-L=Ll,x. T =x, and?.-?=?, 
where - is commutative. The structure T is used to give a three-valued interpre- 
tation of formulas (? is inconclusive). The structure (R°, max, min, —00, oo) is 
the maz-min semiring over the extended reals. The structure (R,+,-,0,1) isa 
semiring and Z (integers) and N (natural numbers) are subsemirings of it. 

We interpret the max-min semiring R*° as degrees of truth, where positive 
means true and negative means false. The value 0 is ambiguous. For this reason 
we also consider a variant of R°, where the value 0 is refined into a positive +0 
(true) and a negative —0 (false). We thus obtain the max-min semiring Ri}, 
which is isomorphic to B x Ryo, where Ryo = {x € R | z > 0}. 


For integers i,j € Z we define the intervals [i,j] = {nE Z|i<n< j} 
and [i,co) = {n € Z | i < n}. For a set I of integers and n € Z, define 
n+I={n+i|icI}andn-I={n-iļ|ieI}. 

For a semiring V, an interval J = fi, j] (where i, j are natural numbers) and 
an T-indexed tuple z = (x;)icr whose components are in V, we define Xz = 
Verte = Dhit = Ti +: + xj and [ë = [kerte = [iir = ti 23. E 
the tuple Z is empty (i.e., J = Ø) then we define Xz = 0 and Į [žē = 1. 

We will consider formulas of Metric Temporal Logic (MTL) interpreted over 
traces that are finite or infinite sequences of data items from a set D. We write 
D* (resp., D*) for the set of all finite (resp., non-empty finite) sequences over 
D, and DY = w — D for the set of all infinite sequences over D, where w 
is the first infinite ordinal (i.e., the set of natural numbers). We also define 
D% = D* U DY”. We write e for the empty sequence and |u| for the length of 
a trace, where |u| = w if u is infinite. A finite sequence u € D* can be viewed 
as a function from {0,...,]u] — 1} to D, that is, u = u(0)u(1)...u(\u| — 1). We 
also consider a semiring V whose elements represent quantitative truth values, 
and unary quantitative predicates p : D —> V. We write 1,0: D —> V for the 
predicates given by 1(d) = 1 and O(d) = 0 for every d € D. 

The set MTL(D,V) of temporal formulas is built from the atomic pred- 
icates p : D — V using the Boolean connectives V and ^, the unary temporal 
connectives Pz, Hy, Fz, Gz, and the binary temporal connectives Sz, Sr, Ur, Ur, 
where J is an interval of the form [i, j] or [i, o0) with i, j < w. For every temporal 
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p(p, u, i) = p(u(2)) 


ple V p, u, i) = ply, u, i) + pp, u, i) plp Aw, u, i) = p(y, u, i): p0p, u, i) 
p(Pro, u, i) = D jei-I,j>0 ply, u, j) PCH rg, u, i) = I]jei-r, j>0 p(y, u, j) 
p(Fro, u, i) = Z jei+I j<lul p(y, u, j) p(Gry,u,i) = Ijci+r, j<lul plp, u, j) 


ple Sr b, 4,4) = Dyes—z, 520 (P0 w j) Mk= % k)) 
p(y Sr P, u, i) = Meir, jz0 (2%, u, j) + DT u, k)) 
ple Urb, 4,4) = Djeiyr, jclu (I Ziele u k) ) oh, u j) 


ple Ür p, wi) = Tests, jelu (Ziele u k) + el, u 3) ) 


Fig. 1: Semiring-based quantitative semantics for MTL. 


connective X € {P,H,S,5,F,G,U,U}, we write X; as an abbreviation for Xiii 
and X as an abbreviation for Xj9,.0)- 

Since we focus in this paper on online monitoring, we restrict attention to the 
future-bounded fragment of MTL, where the future-time temporal connectives 
are bounded. That is, every Uz connective is of the form Uj,» for a < b < w 
(and similarly for Fz, Gz, U 1). We always assume this restriction on formulas. 

We interpret the formulas in MTL(D, V) over traces from D° and at specific 
time points. The interpretation function p: MTL(D,V) x D® x w — V, where 
p(y, u,t) is defined when i < |u|, is shown in Fig. |1| We say that the formulas 
p and w are equivalent, and we write yp = y, if p(y,u,i) = p(y, u, i) for every 
u € D® andi < |u|. For every formula y and every interval J, it holds that 
Pry =1Sry, Hry = 0S; y, Fro =1Ury, and Gro = 0 Uz ọ. 

We say that a semiring V refines B if there is a semiring homomorphism 
h: V — B. Notice that h is necessarily an epimorphism because h(0) = L and 
h(1) = T. Informally, we think of h~!(L) as the subset of “false” values and 
h~'(T) as the subset of “true” values. In particular, this means that V can be 
partitioned into true and false values. There are semirings that cannot refine B. 
For example, the semiring (Z,+,-,0,1) of the integers cannot refine B. 

Let h : V > B. For a predicate p : D > V, we say that d € D h-satisfies 
p, and we write d n» p, if h(p(d)) = T. For u € D® and i < |u| we define the 
satisfaction relation n as usual (for atomic formulas: u, i p p iff u(i) En p). 


Lemma 2. Let D be a set of data items, V be a semiring, and h : V > B. The 
following are equivalent: 

(1) The function h is a semiring homomorphism. 

(2) u,i Hn p iff h(polp,u,i)) = T for every y : MTL(D, V), u € D® and i < Jul. 


Lemmaf}]says that the qualitative semantics p agrees with the quantitative 
semantics p exactly when h : V —> B is a semiring homomorphism. In this case, 
p is more fine-grained and loses no information regarding Boolean satisfaction. 


Lemma 3. Let D be a set of data items and V be a semiring. The identities of 
Fig. [2] hold for all formulas vy, 7 € MTL(D,V). 
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Py = Pi(Py~) Ve and Hy = Hi(Hy) A ~ 
eSp=(Pilesy) Ag) Ve 
Pla, œ) = PaP and Hya,0)~ = HaHy 
P Sta,co) Y = Pa (p S Y) A Hpoa-1)9, fora >1 
Piao]? = PaPios—ajp and Hya,oj¢ = HaHjo,b—a] P 
P Stab] Y = Pal¥ Sjo,e—a) Y) A Hjo,a—1¥, fora > 1 
Fio a] = FoPio,nj¢ and Gio,a} P = GoHio,.j¢ 
Fra, = FoPo,—ajy and Grasp = GoHjo, b-a]? 
P Ula,b] Y = Go,a—1 9 A Fa (9 Ulo, b-a] Y), fora >1 


Fig. 2: Equivalences between temporal formulas. 


The identities of Fig. are all shown using the semiring axioms. The identity 
below can be used to reduce the monitoring of Sjo,a] to Pio,aj- 


e Sio,a] Y = (PSY)A Prog? (1) 


An early occurrence of this idea is in [19], where they consider the more general 
(future-time) form 9 Ujao) Y = (Y Ula) Y) A Fiap. Prior work on efficient 
monitoring [15] uses an algorithm based on it. Specifically, uses a sliding-max 
algorithm [32], which can be applied to the max-min semiring R*© and other 
similar linear orders, but is not applicable to partial orders or other semirings. 


Proposition 4. For a set D with at least two elements and a semiring V, the 
following are equivalent: 

(1) The semiring V is a bounded distributive lattice. 

(2) Equivalence (1) holds for all formulas y, Y € MTL(D, V). 


Proposition|4] gives a precise characterization of when the identity applies. 
This characterization is axiomatic and identifies the class of bounded distributive 
lattices as the most general class for which the identity is valid. One important 
implication is that monitors that are based on this identity cannot be used for 
other semirings such as (R,+,-,0,1) and (N,+,-,0,1). 


Example 5 (Uncertainty). We want to identify a notion of quantitative truth 
values in situations where we interpret formulas over a signal x[n] that is not 
known with perfect accuracy, but we can put an upper and lower bound on each 
sample, i.e., a < x[n] < b. For example, suppose that we know that 99.9 < x[0] < 
100.1 and we want to evaluate the atomic predicate p = “x > 99” at time 0. The 
truth value can be taken to be the interval [0.9, 1.1] in this case, since there is 
uncertainty in the distance of signal value from the threshold. 

More concretely, this situation of uncertain input signal can arise in the mon- 
itoring of systems where the raw signal is captured at one site, then compressed 
and transmitted to another site for monitoring. In many resource-constrained 
settings (e.g., certain IoT systems), the signal has to be compressed with a lossy 
compression scheme in order to meet network bandwidth constraints. So, at the 
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monitoring site, the exact signal values are not known but can possibly be placed 
within intervals (depending on the used compression scheme). 

In order to model this kind of uncertainty, we consider the set Z(R*™) of in- 
tervals of the form [a,b] with a < b and a,b € R*. An interval [a,b] C R#® can 
be thought of as an uncertain truth value (it can be any one of those contained in 
(a, b]). For intervals [a,b] and [c,d] we define [a,b] @ [c, d] = [max(a, c), max(b, d)| 
and [a,b] 8 [c, d] = [min(a, c), min(6, d)]. An interval of the form [a, a] is equal to 
the singleton set {a}. The structure (Z(R*°), 9, ®, {—oo}, {oo}) is a semiring. 

The semiring Z(R*°®) is a partial order (more specifically, it is a bounded 
distributive lattice) and therefore does not fit existing monitoring frameworks 
that consider only linear orders (e.g., the max-min semiring R% of the extended 
reals and the associated sliding-max/min algorithms). 


3 Symbolic Quantitative Traces and Languages 

In this section we start with our investigation of how to generalize the “robust- 
ness degree” of to our abstract algebraic setting. The result of that 
relates the robustness degree with the robust semantics is an inequality. For this 
reason, we focus on idempotent semirings, for which there is a natural partial 
order < that is induced by semiring addition (x < y iff x+y = y). Since our 
approach is abstract algebraic (i.e., axiomatic), we have no notion of real-valued 
distance between elements of D. Moreover, V does not need to be a semiring 
of real numbers. Instead, we rely on the intuition that for an atomic predicate 
p: D > V and a data item d € D, the value p(d) gives a degree of truth or 
falsity. We propose using symbolic traces X = pop ..-Pn—1, Which are sequences 
of atomic predicates, in order to compactly represent sets of concrete traces, 
which are sequences of data items. If each p; represents a subset S; C D, then x 
represents the set L = So x S1 X -++ X Sp—1 = {vov1---Un—1 | vi E Si for each i} 
of concrete traces. Moreover, given a concrete trace u = ugly, ...Un—1 E D”, we 
can use the value po(uo) - pi(t1) ++: Pn—1(Un—-1) E€ V as a quantitative measure 
of how close the trace u is to the set of traces L. We propose the interpretation 
of a formula y as a language of symbolic traces. This will allow us to define the 
“closeness” of a trace u € D” to the specification y as a (semiring) sum of all the 
closeness values w.r.t. each symbolic trace in the symbolic language of y. We will 
also see that this interpretation of a formula y as a symbolic language is com- 
patible with the standard interpretation of y as a set of concrete traces. Using 
these definitions we obtain a generalization of the theorem of [22] that relates 
the robustness degree with the robust semantics. Additionally, we characterize 
precisely the class of semirings for which this generalization is possible. 

Let V be an idempotent semiring. For predicates p,q : D —> V we define p < q 
if p(d) < q(d) for every d € D. The intuition for p < q is that p is a stronger 
predicate than q. We write F(D,V) to denote the set of atomic quantitative 
predicates, which always includes the predicates 1 and 0. For symbolic traces 
x,y € F(D,V)* with A = |x| = |y| we define x < y if x(i) < y(i) for every 
i < X. These relations < on predicates and traces are partial orders. We define 
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the symbolic satisfaction relation =, where x,i — y says that the formula ¢ : 
MTL(D,V) is satisfied by the symbolic trace x € F(D,V)® at position i < |x]. 
For atomic formulas, we put x,i = p iff x(i) < p. The definition is given by 
induction on y in the usual way. For a formula y : MTL(D,V), length À € 
wU {w} and a position 7 < A, we define the symbolic language SL(y, A, i) = {x € 
F(D,V)* | x,i H y}. For nonempty finite traces x € F(D, V)” and u € D” of 
the same length, we define x[u] = [];_,x(2)(u(é)), where n = |x| = |u|. Since 
the semiring multiplication is monotone w.r.t. <, we see that x < y implies 
x[u] < y[u] for every u € D”. Informally, the value x[u] quantifies how close the 
concrete trace u is to the symbolic trace x. 


Example 6. Let D = R and V = R”. For c € R, the predicate p = “x > œ” 
is defined by p(d) = d—c for every d € D. The predicate q = “ax < œ” is given by 
q(d) = c—d for every d € D. For the symbolic trace x = “x > 1” “a < 5” “a > 2” 
and the concrete trace u = 368 we get that x[u] = min(2,—1,6) = —1. 

Let c,d € R. For the predicates p = “x > c” and q = “x > d” we have that 
p<qiffd<c. Similarly, for the predicates p = “a < c” and q = “x < d” it holds 
that p < q iff c < d. Finally, notice that the predicates “x > c” and “a < d” are 
incomparable. Consider y = “ax > 0” “a < 7” “a > 1” and observe that x < y. 

For the formula y = p A Fyq, where p and q are atomic predicates, we have 
that SL(p, 2,0) = {p'q' € F(D,V)? |p’ < p and g' < q}. 


The definition of the robustness degree in involves the value —dist(u, L) = 
—inf,ez dist(u,v) = supper —dist(u,v), where u is a trace, L is a set of traces, 
and dist is a metric. Notice that this is a supremum over a potentially infinite 
set. The semirings that we have considered so far have an addition operation 
that can model a finitary supremum. In order to model an infinitary supremum, 
we need to consider semirings that have an infinitary addition operation. A 
complete semiring is an algebraic structure (V, +, >>, -,0,1), where >}, 7; is the 
sum of the J-indexed tuple of elements (x;)ic7, that satisfies: (1) $ ;eg£i = 0, 
iets} Pi = Tj, Vie gy ayes = Tj + ee for j Fk, and J per Mien i = Lier Vi 
where 7 = Upeg {k and the index sets (Ik)ķex are pairwise disjoint, (2) (V,+, 1) 
is a monoid, (3) the infinite distributivity properties (erti): Y = ie 1 (iy) 
and z: (X iceryi) = Yiier(eys) hold for every index set J and all 2;,y € V, and 
(4) 0 is an annihilator for multiplication. A complete semiring V is idempotent 
if }0,-,%1 = x for every non-empty index set J with x; = x for every i € I. For 
example, (R=, max, sup, min, —co, +00) is an idempotent complete semiring. 
For a formula y : MTL(D,V), a trace u € Dt and i < n = |u|, we define 


val(y, u, i) = P xeSLlo,n,i) x[u]. (2) 


Informally, val(y, u, i) is a measure of how close the trace u is to satisfying ọy at 
position 7. It is an abstract algebraic variant of the robustness degree |22). 


Theorem 7 (Approximation). Let D be a set of data items and V be an 
idempotent complete semiring. Then, the following are equivalent: 
(1) The multiplication of V is idempotent and 1 is the top element of V. 
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(2) For every y : MTL(D,V), u € Dt and i < Jul, val(y, u, i) < p(y, u, i). 


Proof. Assume that (1) holds. Let n > 1 be an integer. For a symbolic language 
£L C F(D, V)” and for u € D”, we define val(£, u) = X pecX(u). Let {Li}ier be 
a collection of languages with £L; C F(D, V)”. Then, 


Uet D a a ll ene (8) 


For symbolic languages £1, L2 C F(D, V)”, define £L = L1 N Lo, Li = Li \ Le 
and £5 = L2 \ £1. Then, Lı = L4 UL and La = L5 U L. The languages L4, L4, L 
are pairwise disjoint. So, we have that val(£1, u) = z + z and val(L2,u) = y +z, 
where x = val(£1, u), y = val(£5,u) and z = val(£, u). It follows that 


val(Lı N L2, u) = z = zz < (x + z)(y + z) = val(£1, u) - val(£2, u) (4) 


by the idempotence of multiplication. This property extends to val(£ı A- N 

Ly, u) < val(£1,u)---val(Lk, u). Now, we will prove (2) by induction on g. 

— For the base case we have SL(p,n,i) = {x € F(D, V)” | x(i) < p}. Define 
y € F(D, V)” by y(i) = p and y(j) = 1 for every j 4 i. That is, y = 
1f p1”7i71, For every x € SL(p,n,i) we have x(i) < p and therefore x < y 
(since 1 is the top element of V). It follows that x[u] < y[u] = p(u(i)). So, 
val(p, u, i) = D xesh p xlu] < plu(i)) = p(p, u, i). 

— For the case of disjunction, we have SL(yV y, n, i) = SL(y, n,2)USL(wv, n, i). It 
follows that val(p Vy, u, 7) < val(y, u, i) +val(w,u,2) < p(y, u, ti) +p(v, u, i) = 
p(y V w,u,t) by the induction hypothesis and (3). 

— For the case of conjunction we observe that val(y A wv, u, i) = val(SL(y, n, i) A 
SL(4, n, i), u) < val(y, u, i) ` valy, u, i) < PY, u, i) i p(y, u, i) = p(y A Y, u, i) 
by the induction hypothesis and (4). 

The rest of the cases S, 5, U, U can be dealt with similarly using and (4). 

The proof that (2) implies (1) is not too difficult, and we therefore omit it. 


T heorem [7] could be considered an abstract algebraic counterpart of the re- 
sult of [22] (page 4268, Theorem 13) for discrete finite traces. We will discuss 
later how it can be used to obtain the original result (for the max-min semiring 
R=) as a corollary. Additionally, Theorem [7] gives a precise equational char- 
acterization of the class of semirings for which the relationship between the two 
semantics holds. 

Let D be a set of data items, V be asemiring andh: V > B. For a formula yọ : 
MTL(D,V), length A E€ wU{w} and i < A, we define the concrete trace language 
CL) (p, à, i) = {u € D* | u,i En p}. For a symbolic trace x € F(D,V)*, we 
define its (concrete) trace language by CL;(x) = {u € Dò | u Hn x}, where 
u =n x means that u(i) Hnr x(i) for every i < n. Lemma [8| below establishes 
a correspondence between the symbolic and concrete language of a formula y, 
which we need to connect Theorem [7] to the concrete setting of [22]. 


Lemma 8 (Concrete and Symbolic Languages). Let D be a set of data 
items, V be an idempotent semiring with top element 1, and h : V > B be a 
semiring homomorphism. For every formula y : MTL(D,V), length A € wU {w}, 


and position i < A, it holds that CL) (p, A, i) = Uxesty,r,1) Cha (x). 
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4 Relationship with robust semantics 


In this section, we consider the concrete quantitative setting where V is the max- 
min semiring R*°. We will obtain the result of [22] that relates the robustness 
degree with the robust semantics as a consequence of Theorem [7] 

A metric space is a set M together with a function dist : MxM — R>o, called 
metric, satisfying: (1) dist(a,y) = 0 iff xz = y for all x,y € M, (2) dist(x, y) = 
dist(y, x) for all x,y € M, and (3) dist(x, z) < dist(x, y)+dist(y, z) for all x,y, z € 
M. Given a metric dist on M we define the distance function Dist as follows: 


dist : M x P(M) > RX) Dist : M x P(M) > Rt© 
dist(x, S) = infye s dist(x, —di i 
(x, 5) yes dist(x, y) Dist(d, 3) = dist(d, S), ifd gs 
dist(x, 0) = 00 dist(d,~S), ifdeS 


where ~S = M \S is the complement of S. Notice that Dist(x, Ø) = —co. 

Let D be a metric space of points (data items). Let p be a propositional 
letter (symbol), and O(p) C D be its interpretation, that is, the set of points 
for which p is true. The corresponding quantitative predicate is p : D — R5% 
given by p(d) = Dist(d, O(p)) for every d € D. Given the metric dist on D, we 
obtain a metric dist : Dà x Dà + R& (on the set of traces of length A, where 
à € wU {w}) as follows: dist(u,v) = sup;<) dist(u(é), v(é)). Let CLo(y,n,i) = 
{u € D” | u,i Eo p} be the set of traces (of length n) that satisfy y at i (defined 
using the interpretation function O). Corollary [9] below was proved in [22]. We 
will give a proof that relies on the algebraic variant that we presented earlier. 


Corollary 9. Let D be a set of data items, and V = R**. Let y: MTL(D,V), 
u € D” andi < n (where n > 1). Then, —dist(u, CLo(y,n,i)) < p(y, u, i). 


Proof. We will use the semiring R{¢° = B x R% instead of R=°%, so that the 
value 0 is not ambiguous (it can be either true or false when we use R*°). 
That is, we will have a positive zero +0 (true) and a negative zero —0 (false). 
The semiring homomorphism h : Ri>° — B sends the positive (resp., negative) 
elements to T (resp., L). We will interpret a predicate symbol p as the quan- 
titative predicate p : D => RY>° given by p(d) = —dist(d,O(p)) if d ¢ O(p) 
and p(d) = +dist(d, ~O(p)) if d € O(p). Using these definitions, the satisfaction 
relations Fo and |, are the same, hence CLo and CL, are the same. Now, 


dist(u, CLa (p, n, î)) = dist(u, Uxest(y,n,i)CLn(x)) Lemma [| 


= _ inf inf supdist i def. of dist 
xeSL(o, n,i) vECLp (x) ee (u (i ), v( )) ] 


> inf sup inf dist(u(i), v(i sup inf < inf su 
x€SL(y,n,1) i<n vECLp (x) ( ( ) ( )) P p] 

inf sup inf  dist(u(i), v(i def. of CL 
xESL(p,n,i) icn v(i)EO(x(i)) ( ( ) ( )) ] 


inf supdist(u(i), O(x(2))). def. of dist] 
xESL(vy,n,t) i<n 
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By negating the above inequality we get that 


—dist(u, CLa (p, n, i)) < suPyesi(y,n,iyinficn — dist(u(é), O(x(i))), 


which is < D xesl(y,n,i)*(U] = val(y, u, i). From Theorem [7] we get val(p, u, i) < 
p(p,u,i) and therefore —dist(u, CLo (y, n, i)) < plp, u, i). 


From Corollary [9] we can also obtain p(y, u, i) < dist(u, ~CLo (p,n, i)). This 
inequality is equivalent to —dist(u, ~CLo (p,n, i)) < —p(p, u, i), which in turn is 
equivalent to —dist(u, CLo(~y,n,1)) < p(~y, u, i). The operation ~ on formulas 
is a pseudo-negation, that is, ~y is the formula that results by “dualizing” all 
connectives and negating the atomic predicates. This operation is meaningful 
for the semiring R*°. The final inequality is an instance of Corollary [9] for ~g. 

Theorem [7]and Corollary [9]are not used later for the monitoring algorithm. 
The significance of our theorem is that it can be instantiated to give the existing 
result from [22|. This serves as a sanity check for our algebraic framework and 
it supports the semiring-based semantics of Sect. 


5 Online Monitoring 


For an infinite input trace u € D“, the output of the monitor for the time instant 
t should be p(y,u,t), but the monitor has to compute it by observing only a 
finite prefix of u. In order for the output value of the monitor to agree with the 
standard temporal semantics over infinite traces we may need to delay an output 
item until some part of the future input is seen. For example, in the case of Fip 
we need to wait for one time unit: the output at time t is given after the input 
item at time t + 1 is seen. In other words, the monitor for Fıp has a delay (the 
output is falling behind the input) of one time unit. Symmetrically, we can allow 
monitors to emit output early when the correct value is known. For example, 
the output value for Pp is 0 in the beginning and the value at time t is already 
known from time t — 1. So, we also allow monitors to have negative delay (the 
output is running ahead of the input). The function dl : MTL > Z gives the 
amount of delay required to monitor a formula. It is defined by dl(p) = 0 and 


di(oAW) = max(dl(y),dl(¥)) d Sian) Y) = max(dl(¢y -a 
dl( Stace) V) = max(dl(y),dI()) -a dil Ua.) 9) = max(dl(p),dl(w)) + b. 


The monitor TL(y) for a formula y is a variant of a Mealy machine. If dl(y) = 0, 
the TL(y) is precisely a Mealy machine (one output item per input item) with 
inputs D and outputs V. If £ = dl(w~) > 0, then TL(y) emits no output for the 
first Z steps and then behaves like a Mealy machine. If £ = dl(w~) < 0, then TL(y) 
emits / items upon initialization and continues to behave like a Mealy machine. 

Let A and B be sets. A monitor of type M(A, B) is a state machine G = 
(St, init, o, next, out), where St is a set of states, init € St is the initial state, 
o € B* is the initial output, next : St x A — St is the transition function, and 
out : St x A —> Opt(B) is the output function, where Opt(B) = BU {nil}. 


Y ),dl 
Y ),dl 
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map(op) : M(A, B) ager(b, op) : M(A, B) emit(n, v) : M(A, A) 
St = Unit StB St = Unit 
init = u init = b init = u 
o=e o=e o=v" 
next(s,a) = s next(s, a) = op(s, a) next(s,a) = s 
out(s, a) = op(a) out(s, a) = op(s, a) out(s,a) =a 
ignore(n) : M(A, A) wnd(n,v, op) : M(A, A) wndV(n, op) : M(A, A) 
St = [0, n] St = Buf(A) St = Buf(A) 
init = 0 init = Buf(n, v) init = Buf() 
o=e o=eE o=e 
next(s,a)=s4+1, ifs<n next(s, a) = s.ins(a) next(s,a) = s.ins(a) 
next(s,a) = s, ifs=n out(s,a) = s.ins(a).agg(op) out(s,a) = €, if size(s) <n—1 
out(s,a) = nil, ifs < n out(s,a) = s.ins(a).agg(op), o/w 


out(s,a) =a, ifs=n 


Fig. 3: Basic building blocks for constructing temporal quantitative monitors. 


In Fig. [3] we give several examples of simple monitors that can be used as 
building blocks. The monitor map(op) applies the function op : A > B ele- 
mentwise. The monitor aggr(b, op) applies a running aggregation to the input 
trace that is specified by the initial aggregate b : B and the aggregation function 
op : Bx A > B (similar to the fold combinator used in functional programming). 
The monitor emit(n, v) emits n copies of the value v € A upon initialization and 
then echoes the input trace. The monitor ignore(n) discards the first n items of 
the trace and proceeds to echo the rest of the trace. The monitor wnd(n, v, op) 
performs an aggregation, given by the associative function op : Ax A — A, over 
a sliding window of size n. It initializes the window using the value v : A and 
emits output at the arrival of every item. The monitor wndV(n, op) is different 
in that it starts with an empty window and it only starts emitting output when 
the window fills up with n items. We will combine monitors using the operations 
serial composition >> and parallel composition par. In the serial composition 
G > H the output trace of G is propagated as input trace to H. In the parallel 
composition par(G, H) the input trace to copied to two concurrently executing 
monitors G and H and their output traces are combined. Both combinators >> 
and par are given by variants of the product construction on state machines. 
In the case of par the output traces of G and H may not be synchronized (one 
may be ahead of the other), which requires some bounded buffering in order to 
properly align them. The construction for par is described in [37]. Some variants 
of the combinators of Figure |3| are part of the StreamQL language [29], which 
has been proposed for the processing of streaming time series. 

The identities of Fig. [2] suggest that MTL monitoring can be reduced to a 
small set of computational primitives. In fact, the primitives described earlier are 
sufficient to specify the monitors, as shown in Fig.|4| We write mı: Ax B> A 
for the left projection and 72: A x B — B for the right projection. 

Let u € Dt and n = |u]. If n > a then p(y Sioa] Y, u,n — 1) = ply S y,v, a), 
where v is the suffix of u with a+ 1 items. If n < a then p(y Sjoa} Y, u,n — 1) = 
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TL(p) = map(p) // fill buffer with v (initial values) 
TL(y V Y) = par(TL(y), TL()) > map(+) Tin] buf + [nio]. 
// calculate partial aggregates 
TL(Py) = TL(y) > aggr(0, +) for i+ n — 2 to 0 do 
TL(Pa p) = TL(~) > emit (a, 0) | buffi] + buf [i] @ buffi + 1] 


// initial total aggregate 


TL(P[a,oc) P) = TL(PaPy) T agg + buf [0] 


TL(y S w) = par(TL(y), TL(w)) > aggr(0, opS) Nat m+ 0 // size of new block 
; T z + nil // aggregate of new block 
: V, wh 
a Function Add(T d): 
opS(s,(x,y)) = (s £) +y if m = n then // full new block 
TL(Y Sia,œ) Y) = TL(Pa (p S Y) A Hio,a—1]2) // convert new block to old 
for i + n — 2 to 1 do 
TL(p Sjo») Y) = par(TL(y), TL(Y)) >> | buffi] < buf [2] @ buf [i+ 1] 
wnd(b + 1,0, ®s) >> map(m2) m + 0 // empty new block 
= z+ nil 
TLC? Sjap Y) = TL(Pa (9 Sio,b—al Y) A Hio,a—119) // evict oldest item, replace with d 
TL(Fay) = TL( p) > ignore(a) buf [m] + d 
TL(F = TL(F,P m + m+1 // new block enlarged 
( (2,01?) (FoPio,0—a]#) z«z@d // where nil@d=d 
TL(Y Ujo,oj Y) = par(TL(y), TL(4%)) > if m <n then 
wndV(b + 1, @y) > map(72) | agg + buf [m] 8z 
else // m=n 
TL(Yp Uja,b] W) = TL(Gjo,a—1 9 A Fal Ujo,b—a] Y)) | agg —z 


Fig. 4: Online monitors for bounded-future MTL formulas & sliding aggregation. 


p(y S y, 02t!-"u, a). So, we can implement a monitor for the connective Sjo,a] 
by computing S over a window of exactly a+ 1 data items. 


Proposition 10 (Aggregation for S, U). Let V be a semiring. For every trace 
u = UU... Un—-1 E (V x V)* of length n = |u|, the values p(71 S72, u,n—1) and 
p( Uz, u, 0) can be written as aggregates of the form 72(up @ uy @+ ++ @Un_1). 


Proposition |10} justifies the translation of Sjo,4j/Ujo,,) into monitors (Fig. [4). 
Now, we will describe the data structure that performs the sliding aggregation. 
It is used in Fig. Blin the monitors wnd and wndV. The implementation is shown 
in Fig. |4| Suppose that the current window (of size n) is [%o,%1,-..,%n—1]. We 
maintain a buffer of the form [ap,—m,---,;2n—1; Yo;-++>Yn—-1—m], where the part 
[Cn—m;---;%n—1] is the block of newer elements (“new block”) and the part 
[Yo,---;Yn—-1—m] contains aggregates of the older elements (“old block”). They 
satisfy the invariant yi = iQ: -QLn—1-m for every i = 0,...,n2—1—m. We also 
maintain the aggregate z = £n-m © ++: Q £n—ı of the new block. So, the overall 
aggregate of the window is agg = yo ®z. When a new item d arrives, we evict the 
aggregate yo corresponding to the oldest item zo and replace it by d. Thus, the 
new block is expanded with the additional item d and therefore we also update 
the aggregates z and agg. When the new block becomes full (i.e., m = n) then 
we convert it to an old block by performing all partial aggregations from right 
to left. This conversion requires n — 1 applications of ®, but it is performed once 
every n items. So, the algorithm needs O(1) amortized time-per-item. 


Theorem 11. Let D be a set of data items, V be a semiring, and y : MTL(D,V) 
be a bounded-future formula. The monitor TL(y) : M(D, V) is a streaming algo- 
rithm that needs O(2!*!) space and O(|y|) amortized time-per-item. 
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Proof. The algorithm needs space that is exponential in the size of y because of 
the connectives of the form Xjq,..) and Xjq»j. The monitor uses buffers of size a 
or b—a. Since the constants a,b are written in binary notation, we need space 
that is exponential in the size. The O(|y|) amortized time per element hinges on 
the algorithm of ae which is used for Sjoa] and Ujo,- As discussed earlier, 
this algorithm needs O(1) amortized time-per-item. 


6 Experimental Evaluation 


We have implemented our semiring-based monitoring framework in Rust. We 
compare our implementation with the verified lattice-based monitors of 
and the monitoring tool Reelay [40]. We perform our experiments using the 
(R=%®, max, min) semiring for truth values, which are approximately represented 
using 64-bit floating-point numbers. 

We have observed that all three tools process items at a roughly constant 
rate. We summarize the performance of a monitor with the average time it 
takes to process one data item (i.e., amortized time-per-item). In Fig. |5| we 
consider formulas Xon} Xn; X{n,2n]; X{n,oo) where X € {S,P}. We show the 
time-per-item for the monitors for n = 1,10, 107, 10°, 104, 10°, 10°. We have also 
evaluated how the monitors for future temporal connectives scale with respect to 
the constants in the intervals. In Fig. [6] we benchmark all tools using formulas 
from the Timescales benchmark [39]. Our monitors are generally more than 100 
(resp., 10) times faster than Reelay (resp., the lattice-based tool of [13]). 

The profiling tools Valgrind [38] and Heaptrack are used to analyze the 
memory consumption of the monitors. Our Rust implementation, given a for- 
mula, begins by allocating a fixed amount of memory and does not allocate 
any more memory during the rest of the computation. Reelay allocates and 
de-allocates memory throughout its execution. The lattice-based monitor is im- 
plemented in OCaml (which is a garbage-collected language) and consumes a 
larger amount of memory. In Fig. we plot the peak memory usage of the 
monitors. We note that our tool does not seem to be allocating an increasing 
amount of memory for P, and similar formulas. This is because the correspond- 
ing monitor for Pa emits output as early as possible and therefore does not 
need to use a buffer. In the case of the lattice-based monitor and our tool, we 
observe that the memory consumption does not depend on the input trace (it 
only depends on the formula). In the case of Reelay, it appears that the mem- 
ory consumption depends on the input trace. We have plotted the behavior for 
two different input traces: one that consists of an increasing sequence of values 
(“reelay-ascending” ), and another one that is decreasing (“reelay-descending” ). 
We have only measured the memory usage of Reelay for up to n = 213, as the 
execution becomes very slow beyond this value. 

We use case studies from the automotive domain, which have been sug- 
gested as benchmarks for hybrid system verification [25]. The Automatic Trans- 
mission System has two input signals (a throttle and a break) and three output 
signals: the gear sequence (g; for each gear i), the engine rotation speed (in 
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Fig. 5: Microbenchmark 


rpm, denoted w) and the vehicle speed (denoted v). Based on the suggestions 
in B5], we consider five properties: Ay = w < W, Ap = (w < W) A (v < 7), 
A3 = gi \ Y(g2) —> YHpo,2.5)92 (where Y is notation for P1), Aa = Hos% < 
wW) + Hoo,2.5)(v < 0) and As = (v > D) Sio, ((w > w) Sto,2] ((794) Sto,10] (93) S 
((~g2) 5 (~g1))))). All constants in the temporal connectives are in seconds, and 
we choose the constants V = 120 and w = 4500. Formula A3 says that before 
changing from the second to the first gear, at least 2.5 seconds must first pass. 
Formula Ay says that keeping the engine speed low enough should ensure that 
the vehicle does not exceed a certain speed. Formula As says that changing the 
gear from the first to the fourth within 10 seconds, and then having the engine 
speed exceed W will cause the vehicle speed to exceed v. The other case study is 
a Fault-Tolerant Fuel Control System. We monitor two properties. The first is 
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that the fuel flow rate should frequently become and remain non-zero for a suffi- 
cient amount of time. We encode this as Fı = Hio,10)Pjo,1;(FuelFlowRate > 0). 
The other property is to ensure that whenever the air-to-fuel ratio goes out of 
bounds, then within 1 second it should settle back and stay there for a second. 
This is written as Fy = (Hjo,1jairFuelRatio < 1) 510.2) airFuelRatio < 1. The 
experimental results are shown in Fig. 

All of our experiments were executed on a laptop with an Intel Core i7 
10610U CPU clocked at 2.30GHz and 16GB of memory. Each value reported is 
the mean of 20 executions of the experiment. The whiskers in the plots indicate 
the standard deviation across all executions. 


7 Related Work 


Fainekos and Pappas [22] define the robustness degree of satisfaction in terms 
of the distance of the signal from the set of desirable ones (or its complement). 
They also suggest an under-approximation of the robustness degree which can 
be effectively monitored. This is called the robust semantics and is defined by 
induction on STL formulas, by interpreting conjunction (resp., disjunction) as 
min (resp., max) of R*°. Our paper explores this robust semantics (and the 
related approximation guarantee) in the general algebraic setting of semirings. 
In [27], the authors study a generalization of the robustness degree by consid- 
ering idempotent semirings of real numbers. They also propose an online mon- 
itoring algorithm that uses symbolic weighted automata. While this approach 
computes the precise robustness degree in the sense of [22], the construction 
of the relevant automata incurs a doubly exponential blowup if one considers 
STL specifications. In [I3], it is observed that an extension of the robust seman- 
tics to bounded distributive lattices can be effectively monitored. In this paper, 
we generalize this semantics by considering semirings (bounded distributive lat- 
tices are semirings). Semirings are also used in [9], where the authors consider 
a spatio-temporal logic. They consider the class of constraint semirings, which 
require the semiring order to induce a complete lattice. Efforts have been made 
to define notions of robustness that take temporal discrepancies into account. 
In [20], we see a definition of temporal robustness by considering the effect of 
shifting the signal in time. The “edit distance” between discretized signals is pro- 
posed as a measure of robustness in [26]. Abbas et al. [3] define a notion of (r, €) 
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closeness between signals, which considers temporal and value-based guarantees 
separately. In [2], a metric based on conformance is put forward for applications 
in cardiac electrophysiology. Averaging temporal operators are used in [5], which 
assign a higher value to temporal obligations that are satisfied earlier. 

A key ingredient for the efficient monitoring of STL is a streaming algo- 
rithm for sliding-window maximum [19J15]. The tool Breach [I7[18], which is 
used for the falsification of temporal specifications over hybrid systems, uses the 
sliding-maximum algorithm of [32]. In contrast, we use a more general sliding 
aggregation which applies to any associative operation (not only max/min) and 
does not require the truth values to be totally ordered. 

Different approaches for interpreting future temporal connectives in the con- 
text of online monitoring have been studied. While [I6] assumes the availability 
of a predictor to interpret future connectives, [21] considers robustness inter- 
vals: the tightest intervals which cover the robustness for all possible extensions 
of the available trace prefix. Reelay [40] exclusively uses past-time connectives. 
The transducer-based framework of can be used to monitor rich temporal 
properties which depend on bounded future input by allowing some bounded 
delay in the output. 

There is a large amount of work on formalisms, domain-specific languages and 
associated tools for quantitative online monitoring and, more generally, for data 
stream processing. The synchronous language LOLA has served as the basis 
for the StreamLAB tool [23], which is used for monitoring cyber-physical sys- 
tems. Quantitative Regular Expressions [36] and associated automata-theoretic 
models with registers [7[8]6] have been used to express complex online detection 
algorithms for medical monitoring [I4]. There are many synchronous languages 
and models of computation based on Kahn’s dataflow model that have been 
used for signal processing and embedded controller design |12}11/10). The 
construction of online monitors described in Sect. [5] relies on a set of combina- 
tors that constitute a simple domain-specific language for stream processing. Our 
focus here, however, is on providing efficient monitors for MTL formulas with 
a quantitative semantics, rather than designing a general-purpose language for 
monitor specification. The compositional construction of automata-based moni- 
tors from temporal specifications has also been considered in [34]35]24). 


8 Conclusion 


We have presented a new efficient algorithm for the online monitoring of MTL 
properties over discrete traces. We have used an abstract algebraic semantics 
based on semirings, which can be instantiated to the widely-used Boolean (qual- 
itative) and robustness (quantitative) semantics, as well as to other partially 
ordered semirings. We also provide a theorem that relates our quantitative se- 
mantics with an algebraic generalization of the robustness degree of [22]. We 
have provided an implementation of our algebraic monitoring framework, and 
we have shown experimentally that our monitors scale reasonably well and are 
competitive against the state-of-the-art tool Reelay [40]. 
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Abstract. We present an algorithm for extracting a subclass of the 
context free grammars (CFGs) from a trained recurrent neural network 
(RNN). We develop a new framework, pattern rule sets (PRSs), which 
describe sequences of deterministic finite automata (DFAs) that approxi- 
mate a non-regular language. We present an algorithm for recovering the 
PRS behind a sequence of such automata, and apply it to the sequences 
of automata extracted from trained RNNs using the L* algorithm. We 
then show how the PRS may converted into a CFG, enabling a familiar 
and useful presentation of the learned language. 

Extracting the learned language of an RNN is important to facilitate 
understanding of the RNN and to verify its correctness. Furthermore, the 
extracted CFG can augment the RNN in classifying correct sentences, as 
the RNN’s predictive accuracy decreases when the recursion depth and 
distance between matching delimiters of its input sequences increases. 


Keywords: Model Extraction - Learning Context Free Grammars - 
Finite State Machines - Recurrent Neural Networks 


1 Introduction 


Recurrent Neural Networks (RNNs) are a class of neural networks adapted to 
sequential input, enjoying wide use in a variety of sequence processing tasks. Their 
internal process is opaque, prompting several works into extracting interpretable 
rules from them. Existing works focus on the extraction of deterministic or 
weighted finite automata (DFAs and WFAs) from trained RNNs [18,6,26,3]. 

However, DFAs are insufficient to fully capture the behavior of RNNs, which 
are known to be theoretically Turing-complete [20], and for which there exist 
architecture variants such as LSTMs [14] and features such as stacks [9,23] 
or attention [4] increasing their practical power. Several recent investigations 
explore the ability of different RNN architectures to learn Dyck, counter, and 
other non-regular languages [19,5,28,21], with mixed results. 

While the data indicates that RNNs can generalize and achieve high accuracy, 
they do not learn hierarchical rules, and generalization deteriorates as the length 
and ‘depth’ of the input grows [19,5,28]. Sennhauser and Berwick conjecture that 
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“what the LSTM has in fact acquired is sequential statistical approximation to 
this solution” instead of “the ‘perfect’ rule-based solution” [19]. Similarly, Yu et. 
al. conclude that “the RNNs can not truly model CFGs, even when powered by 
the attention mechanism” [28]. This is line with Hewitt et. al., who note that a 
fixed precision RNN can only learn a language of fixed depth strings [13]. 


Goal of this paper We wish to extract a CFG from a trained RNN. In particular, 
we wish to find the CFG that not only explains the finite language learnt by the 
RNN, but generalizes it to strings of unbounded depth and distance. 


Our approach Our method builds on the DFA extraction work of Weiss et al. 
[26], which uses the L* algorithm [2] to learn the DFA of a given RNN. As part 
of the learning process, L* creates a sequence of hypothesis DFAs approximating 
the target language. Our main insight is in treating these hypothesis DFAs as 
coming from a set of underlying rules, that recursively improve each DFA’s 
approximation of the target CFG by increasing the distance and embedded depth 
of the sequences it can recognize. In this light, synthesizing the target CFG 
becomes the problem of recovering these rules. 

We propose the framework of pattern rule sets (PRSs) for describing such 
rule applications, and present an algorithm for recovering a PRS from a sequence 
of DFAs. We also provide a method for converting a PRS to a CFG, and 
test our method on RNNs trained on several PRS languages. Pattern rule sets 
are expressive enough to cover several variants of the Dyck languages, which 
are prototypical context-free languages (CFLs): the Chomsky—Schiitzenberger 
representation theorem shows that any CFL can be expressed as a homomorphic 
image of a Dyck language intersected with a regular language[16]. 

A significant issue we address is that the extracted DFAs are often inexact, 
either through inaccuracies in the RNN, or as an artifact of the L* algorithm. 

To the best of our knowledge, this is the first work on synthesizing a CFG 
from a general RNN (though some works extract push-down automata [23,9] 
from RNNs with an external stack, they do not apply to plain RNNs). The overall 
steps in our technique are given in Figure 1. 


Contributions The main contributions of this paper are: 


— Pattern Rule Sets (PRSs), a framework for describing a sequence of DFAs 
approximating a CFL. 

— An algorithm for recovering the PRS generating a sequence of DFAs, that 
may also be applied to noisy DFAs elicited from an RNN using L* . 

— An algorithm converting a PRS to a CFG. 


Synthesizing Context-free Grammars from Recurrent Neural Networks 353 


— An implementation of our technique!, and an evaluation of its success on 
recovering various CFLs from trained RNNs. 


2 Definitions and Notations 


2.1 Deterministic Finite Automata 


Definition 1 (Deterministic Finite Automata). A deterministic finite au- 
tomaton (DFA) over an alphabet X is a 5-tuple (X, q0, Q, F,6) such that Q is a 
finite set of states, qo E€ Q is the initial state, F C Q is a set of final (accepting) 
states and ô: Q x X — Q is a (possibly partial) transition function. 


Unless stated otherwise, we assume each DFA’s states are unique to itself, i.e., 
for any two DFAs A, B — including two instances of the same DFA - Q4NQp = 0. 
A DFA A is said to be complete if 6 is complete, i.e., the value 6(q,c) is defined 
for every g,o E€ Q x X. Otherwise, it is incomplete. 

We define the extended transition function ô : Q x X* + Q and the language 
L(A) accepted by A in the typical fashion. We also associate a language with 
intermediate states of A: L(A,qi,q2) & {w € &* | 6(q1,w) = q2}. The states 
from which no sequence w € X* is accepted are known as the sink reject states. 


Definition 2. The sink reject states of a DFA A = (X,qo,Q, F,ô) are the 
maximal set Qr C Q satisfying: QRAN F = Í, and for every q E€ Qr ando € X, 
either 0(q,0) E Qr or 0(q,c) is not defined. 


Definition 3 (Defined Tokens). Let A = (X, qo, Q, F, ô) be a complete DFA 
with sink reject states Qpr. For every q € Q, its defined tokens are def (A, q) Ê 
{o € X | ôlq, o) ¢ Qr}. When the DFA A is clear from context, we write def (q). 


All definitions for complete DFAs are extended to incomplete DFAs A by 
considering their completion - an extension of A in which all missing transitions 
are connected to a (possibly new) sink reject state. 


Definition 4 (Set Representation of ô). A (possibly partial) transition func- 
tion ô : Qx > Q may be equivalently defined as the set Ss = { (4,0, q’) | lq, o) = 
q'}. We use 6 and Ss interchangeably. 


Definition 5 (Replacing a State). For a transition function 6:Q x X >Q, 
state q E€ Q, and new state qn € Q, we denote by diqeq,) : QX X > Q! the 
transition function over Q! = (Q \ {q}) U {qn} and X that is identical to 6 except 
that it redirects all transitions into or out of q to be into or out of qn. 


1 The implementation for this paper, and a link to all trained RNNs, is available at 
https://github.com/tech-srl/RNN_to_PRS_CFG. 
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2.2 Dyck Languages 


A Dyck language of order N is expressed by the grammar D ::= €e | Li D 
Rı |... | Ly D Ry | DD with unique symbols L;,...,Ly,Di,...,Dv. A 
common measure of complexity for a Dyck word is its maximum distance (number 
of characters) between matching delimiters and embedded depth (number of 
unclosed delimiters) [19]. We generalize and refer to Regular Expression Dyck 
(RE-Dyck) languages as languages expressed by the same CFG, except that each 
L; and each R; derive some regular expression. 

We present regular expressions as is standard, for example: L({alb}-c) = 
{ac,bc}. 


3 Patterns 


Patterns are DFAs with a single exit state qx in place of a set of final states, and 
with no cycles on their initial or exit states unless qo = qx. 


Definition 6 (Patterns). A pattern p = (X, qo, Q,qx,ô) is a DFA AP = 
(2, qo, Q, {ax}, 6), satisfying: 1. L(A?) £0, and 2. either qo = qx, or def(qx) = 0 
and L(A, qo, do) = {£}. If qo = qx then p is called circular, otherwise, it is non- 
circular. Patterns are always given in minimal incomplete presentation. 


We refer to a pattern’s initial and exit states as its edge states. All the 
definitions for DFAs apply to patterns through A?. We denote each pattern p’s 
language L, = L(p), and if it is marked by some superscript 7, we refer to all of 
its components with superscript i: pt = (X, qb, Q’, q% , 6"). 


3.1 Pattern Composition 


We can compose two non-circular patterns p!,p? by merging the exit state of p! 
with the initial state of p°, creating a new pattern p? satisfying Lps = Lp- Lp2. 


Definition 7 (Serial Composition). Let pt, p? be two non-circular patterns. 

Their serial composite is the pattern p! o p’ = (X, ql, Q, q3, 8) in which Q = 

Oud? \ {q4} and ô = Ög eq Y 6°. We call q8 the join state of this operation. 
x 0 


If we additionally merge the exit state of pọ with the initial state of pı, we 
obtain a circular pattern p which we call the circular composition of pı and po. 
This composition satisfies Lp = {Lp,-Lp,}*. 


Definition 8 (Circular Composition). Let p', p? be two non-circular patterns. 
Their circular composite is the circular pattern pı cp2 = (X, ql, Q, qb, 8) in which 

_ol 2 1 2 _ sl 2 2 a 
Q=Q'UQ* \ {ay, 0} and ô = Siang U Sig ad] We call qĝ the join state 
of this operation. 


Figure 2 shows 3 examples of serial and circular compositions of patterns. 
Patterns do not carry information about whether or not they have been 
composed from other patterns. We maintain such information using pattern pairs. 
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Fig. 2. Examples of the composition operator 


Definition 9 (Pattern Pair). A pattern pair is a pair (P, P.) of pattern sets, 
such that Pe C P and for every p € P. there exists exactly one pair pı, pọ € P 
satisfying p = pı © p2 for some © € {0,0,}. We refer to the patterns p € P, as 
the composite patterns of (P, P.), and to the rest as its base patterns. 


We will often discuss patterns that have been composed into larger DFAs. 


Definition 10 (Pattern Instances). Let A= (X, qé!,Q4,F,64) be a DFA, 
p= (5,00, Q,¢x,0) be a pattern, and p = (X, q0, Q', dx, 0") be a pattern ‘inside’ 
A, i.e., Q! C Q^ and 5’ C 54. We say that p is an instance of p in A if p is 
isomorphic to p. 


A pattern instance in a DFA A is uniquely determined by its structure and 
initial state: (p,q). If p is a composite pattern with respect to some pattern pair 
(P, P.), the join state of its composition within A is also uniquely defined. 


Definition 11. For every pattern pair (P, P.), for each composite pattern p € Pe, 
DFA A, and initial state q of an instance p of p in A, join(p,q, A) returns the 
join state of p with respect to its composition in (P, Pe). 


4 Pattern Rule Sets 


For any infinite sequence S = Aj, Ag,... of DFAs satisfying L(A;) C L(Ai+1), for 
all i, we define the language of S as the union of the languages of all these DFAs: 
L(S) =U;L(A;). Such sequences may be used to express CFLs. 

In this work we take a finite sequence Aj, A2, ..., An of DFAs, and assume it 
is a (possibly noisy) finite prefix of an infinite sequence of approximations for a 
language, as above. We attempt to reconstruct the language by guessing how the 
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sequence may continue. To allow such generalization, we must make assumptions 
about how the sequence is generated. For this we introduce pattern rule sets. 
Pattern rule sets (PRSs) create sequences of DFAs with a single accepting 
state. Each PRS is built around a pattern pair (P, P.), and each rule application 
connects a new pattern instance to the current DFA A;, at the join state of 
a composite-pattern inserted into A; at some earlier point. To define where a 
pattern can be connected to A;, we introduce an enabled instance set T. 


Definition 12. An enabled DFA over a pattern pair (P, P.) is a tuple (A,T) 
such that A = (X, qo, Q, F,5) is a DFA and T C P, x Q marks enabled instances 
of composite patterns in A. 


Intuitively, for every enabled DFA (A,Z) and (p,q) € Z, we know: (i) there is 
an instance of pattern p in A starting at state q, and (ii) this instance is enabled; 
i.e., we may connect new pattern instances to its join state join(p, q, A). 


Definition 13. A PRS P is a tuple (X, P, P., R) where (P, P.) is a pattern pair 
over the alphabet X and R is a set of rules. Each rule has one of the following 
forms, for some p,p!, p°, p’, p! € P, with p! and p? non-circular: 

(1) L> p? 

(2) p —>e (P! © p?)= p>, where p = p! © p? for © € {0,0}, and p? is circular 
(3) p —>, (p! o p?)æ p’, where p = p! o p? and p? is non-circular 


A PRS derives sequences of enabled DFAs as follows: first, a rule of type (1) 
creates (A1, Z1) according to p’. Then, for every (A;,Z;), each rule may connect 
a new pattern instance to A;, specifically at a state determined by Z;. 


Definition 14 (Initial Composition). Dı = (A1, T1) is generated from a rule 
1— p! as follows: Ay = AP’, and T; = {(p',a6)} if p! € P. and otherwise Ta = 0. 


Let D; = (Ai, Ti) be the enabled DFAat step i and denote A; = (X, go, Q, F, ô). 
Note that for A4, |F| = 1, and for all A;,1, F is unchanged (by future definitions). 

Rules of type (1) extend A; by grafting a circular pattern to qo, and then 
enabling that pattern if it is composite. 


Definition 15 (Rules of type (1)). A rule L—> p! with circular p! may extend 
(A;,Z;) at the initial state qo of Ai iff def(qo) Ndef(q4) = Ø. This creates the DFA 
Aisi = (X, go, QUQ’\ {45}; F, SUS go) If p € P, then Tizi =TiU{(p", qo)}, 
else iyi = Tj. 


Rules of type (2) graft a circular pattern p? = (X, q8, q3, F, 5°) onto the join 
state q; of an enabled pattern instance p in A;, by merging qe with qj. In doing 
so, they also enable the patterns composing p, if they are composite. 


Definition 16 (Rules of type (2)). A rule p >; (p! © p?)= p? may extend 
(Ai, Ti) at the join state qj = join(p,q, Ai) of any instance (p,q) € Ti, provided 
def (q;) N def(qg) = 0. This creates (Ai+1,Zi41) as follows: Aiyı = (X, q0, Q U 
Q? \ g, F SUs j)» and T3414 = T; U {(p*,q*) | pë € Pe, k € {1,2,3}}, where 


laa; 


q! =q and @ = @ = qj. 


Synthesizing Context-free Grammars from Recurrent Neural Networks 357 


p3 
ttt, p3 
(i) A E t p1op2 >, (p10 p2)a= p3 
p1 p2 ii \ plo è £ P2 
Ab 


‘ll wer 
ii p3 
(ii) pI $ a ps 
f | } plo, p2 >. (p10, p2}o= p3 
2 ‘lt p S 
P2 © 
(il) i ON POE propr = (p10 p2)o= p3 
\ p1 ec px i 3 p3 p1 ic & px 
oe), | Ov w 
© c BE 
i 7 i ADs 
(iv) A p1 é px » \ sa ` A : c x 
: p I c 
: oi p 
“snnenens, TTT 
\ *p2 
Legend: O initial state M exit state @ joinstate =a wunane p transitions added to successor DFA 


= = =b transitions in original DFA that are not part of p10 p2 


Fig. 3. Structure of DFA after applying rule of type 2 or type 3 


Example applications of rule (2) are shown in Figures 3(i) and 3(ii). 

We also wish to graft a non-circular pattern p? between p! and p?, but this 
time we must avoid connecting the exit state q% to qj lest we loop over p? 
multiple times. We therefore replicate the outgoing transitions of qj in p! o p° to 
the inserted state q} so that they may act as the connections back into the DFA. 


Definition 17 (Rules of type (3)). A rule p >s (p! o p?)= p? may extend 
(A;,Z;) at the join state qj = join(p, q, Ai) of any instance (p,q) € Ti, provided 
def(q;) N def (q8) = 0. This creates (Aj41,Zi41) as follows: Aiz1 = (Y,q0,QU 
Q? er Fou Oiga; UC) where C = { (q%,0,5(q;,0))| o € def (p°, @)}, and 


Tipi = T; U {(p®,q") | p € Pe, k € {1,2,3}} where q! = q and @? = @ = qj. 


We call C the connecting transitions. We depict this rule application in 
example in Fig. 3 (iii), in which a member of C is labeled ‘c’. 

Multiple applications of rules of type (3) to the same instance p will create 
several equivalent states in the resulting DFAs, as all of their exit states will 
have the same connecting transitions. These states are merged in a minimized 
representation, as depicted in Diagram (iv) of Figure 3. 

We write A € G(P) if there exists a sequence of enabled DFAs derived from 
P s.t. A = A; for some A; in this sequence. 


Definition 18 (Language of a PRS). The language of a PRS P is the union 
of the languages of the DFAs it can generate: L(P) = Usea) L(A). 
4.1 Examples 


Example 1: Let p! and p? be the patterns accepting ‘a’ and ‘b’ respectively. 
Consider the PRS Ras with rules, L—> pt o p? and p! o p? —>, (p! op?)= (pt o p?). 
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This PRS creates only one sequence of DFAs. Once the first rule creates the initial 
DFA, by continuously applying the second rule we obtain the infinite sequence of 
DFAs each satisfying L(A;) = {abf : 1 < j < i}, and so L(Rab) = {atb : i > O}. 
Figure 2(i) presents A1, while Az and A3 appear in Figure 4(i). We can substitute 
any non-circular patterns for p! and p?, creating the language {xtyŻ : i > 0} for 
any non-circular pattern regular expressions x and y. 


Fig. 4. DFA sequences for Ray and RDyck2 


Example 2: Let p',p?,p*, and pë be the non-circular patterns accepting ‘(’, ‘)’, 
‘P, and ‘]’ respectively. Let p? = pt oc p? and pê = pt oe pë. Let Rpycx2 be the PRS 
containing rules | —> p®, L —> pô, p>», (po, pje p®, p? —>e (p' oe p*)= p®, 
pë >e (p* oe p®)= p’, and p® >e (p* oe p®)o p°. RDyck2 defines the Dyck 
language of order 2. Figure 4 (ii) shows one of its possible DFA-sequences. 


5 PRS Inference Algorithm 


A PRS can generate a sequence of DFAs defining, in the limit, a context-free 
language. We are now interested in inverting this process: given a sequence of DFAs 
generated by a PRS P, can we reconstruct P? Coupled with an L* extraction of 
DFAs from a trained RNN, solving this problem will enable us to extract a PRS 
from an RNN - provided the extraction follows a PRS (as we often find it does). 

We present an algorithm for this problem, and show its correctness. In practice 
the DFAs we are given are not “perfect”; they contain noise that deviates from 
the PRS. We therefore augment this algorithm, allowing it to operate smoothly 
even on imperfect DFA sequences created from RNN extraction. 

In the following, for each pattern instance p in A;, we denote by p the pattern 
that it is an instance of. We use similar notation pt, 67, and p? to refer to specific 
instances of patterns pt, p? and p7. Additionally, for each consecutive DFA pair 
A; and Aj41, we refer by p? to the new pattern instance in A;.1. 


Main steps of inference algorithm. Given a sequence of DFAs S = A,--- An, the 
algorithm infers P = (X, P, P., R) in the following stages: 


1. Discover the initial pattern instance p? in A. Insert p? into P and mark p? 
as enabled. Insert the rule L —> p! into R. 


Synthesizing Context-free Grammars from Recurrent Neural Networks 359 


2. Fori, 1<i<n-1: 
(a) Discover the new pattern instance p* in A;+1 that extends Aj. 
(b) If p? starts at the state go of Aj41, then it is an application of a rule of 
type (1). Insert p? into P, mark p? as enabled, and add L —> pë to R. 
(c) Otherwise (p? does not start at qo), find the unique enabled pattern 
p= p! © p? in A; s.t. p?’s initial state q is the join state of p. Add p}, p?, 
and p? to P, p to P., and mark p!1,p?, and p? as enabled. If p? is non- 


circular, add p +, (p! o p?)= p? to R; otherwise add p >e (p! © p?)= p°. 


3. Define X to be the set of symbols used by the patterns P. 


We now elaborate on how we determine the patterns p’, p?, and Ô. 


Discovering new patterns p! and p? A; provides an initial pattern p. For 
subsequent DFAs, we need to identify which states in Aji = (X, q, Q’, F’, 0’) 
are ‘new’ relative to A; = (X, qo, Q, F, ô). From the PRS definitions, we know 
that there is a subset of states and transitions in A; that is isomorphic to A;: 


Definition 19. (Existing states and transitions) For every q' € Q', we say that 
q exists in A; with parallel state q € Q iff there exists a sequence w € X* such 
that q = 4(qo, w), g= ô' (qo, w), and neither is a sink reject state. Additionally, 
for every 4,95 € Q' with parallel states q1,q2 E€ Q, we say that (q1, 0, q4) € 0’ 
exists in A; iff (q1,0,q2) € 6. We denote Aj+1’s existing states and transitions 
by Qe C Q' and og C 0’7 and the new ones as Qn = Q’\ Qer and by = 0' \ dg. 


By construction of PRSs, each state in A;,, has at most one parallel state in 
A;, which can be found in one simultaneous traversal of the two DFAs. 

The new states and transitions form a new pattern instance p in Aj11, 
excluding its initial and possibly its exit state. The initial state of p is the existing 
state gf E€ Qp that has outgoing new transitions. The exit state q% of Ð is 
identified by the Exit State Discovery algorithm: 


1. If there exists a (q,0,q,) € On, then P is circular: q% = q4. (Fig. 3(i), (ii). 

2. Otherwise, p is non-circular. If it is the first (with respect to S) non-circular 
pattern grafted onto q/, then g’ is the unique new state whose transitions 
into Aj41 are the connecting transitions from Definition 17 (Fig. 3 (iii)). 

3. If there is no such state, then p is not the first non-circular pattern grafted 
onto qi, and q% is the unique existing state qy # q; with new incoming 
transitions. (Fig. 3(iv)). 


Finally, the new pattern instance is p = (X, q;, Qp, dx, 5p), where Qp = Qn U 
{d,, qs} and 6, is the restriction of ôy to the states of Qp. 


Discovering the pattern p (step 2c) In [27] we show that no two enabled 
pattern instances in a DFA can share a join state, that if they share any non-edge 
states, then one is contained in the other, and finally that a pattern’s join states 
is never one of its edge states. This makes finding p straightforward: denoting q; 
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as the parallel of p?’s initial state in A;, we seek the enabled composite pattern 
instance (p,q) € T; for which join(p, q, A;) = q;. If none is present, we seek the 
only enabled instance (p,q) € T; that contains qj as a non-edge state, but is not 
yet marked as a composite. (Note that if two enabled instances share a non-edge 
state, then the containing one is already marked as a composite: otherwise we 
would not have found and enabled the other). 

In [27] we define the concept of a minimal generator and prove the following: 


Theorem 1. Let Aj, Ao,...A, be a finite sequence of DFAs that has a minimal 
generator P. Then the PRS Inference Algorithm will discover P. 


5.1 Deviations from the PRS framework 


Given a sequence of DFAs generated by the rules of PRS P, the inference 
algorithm given above will faithfully infer P. In practice however, we want to 
apply the algorithm to a sequence of DFAs extracted from a trained RNN using 
the L* algorithm (as in [26]). Such a sequence may contain noise: artifacts from 
an imperfectly trained RNN, or from the behavior of L* . The major deviations 
are incorrect pattern creation, simultaneous rule applications, and slow initiation. 


Incorrect pattern creation Whether due to inaccuracies in the RNN classification, 
or as artifacts of the L* process, incorrect patterns are often inserted into the 
DFA sequence. Fortunately, these patterns rarely repeat, and so we can discern 
between them and ‘legitimate’ patterns using a voting and threshold scheme. 

The vote for each discovered pattern p € P is the number of times it has 
been inserted as the new pattern between a pair of DFAs A;, Ai+1 in S. We set a 
threshold for the minimum vote a pattern needs to be considered valid, and only 
build rules around the connection of valid patterns onto the join states of other 
valid patterns. To do this, we modify the flow of the algorithm: before discovering 
rules, we first filter invalid patterns by splitting step 2 into two phases. Phase 1: 
Mark all the inserted patterns between each pair of DFAs, and compute their 
votes. Add to P those whose vote is above the threshold. Phase 2: Consider each 
DFA pair A;, A;11 in order. If the new pattern in Aj, is valid, and its initial 
state’s parallel state in A; also lies in a valid pattern, then synthesize the rule 
according to the original algorithm. If a pattern is discovered to be composite, 
add its composing patterns to P. 

As almost every DFA sequence produced by our method has some noise, the 
voting scheme greatly extended the reach of our algorithm. 


Simultaneous rule applications In the theoretical framework, A;+; differs from 
A; by applying a single PRS rule, and therefore gi and q% are uniquely defined. 
L* however does not guarantee such minimal increments between DFAs. In 
particular, it may apply multiple PRS rules between two subsequent DFAs, 
extending A; with several patterns. To handle this, we expand the initial and 
exit state discovery methods given above. 


1. Mark the new states and transitions Qy and ôy as before. 
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2. Identify the set of new pattern instance initial states (pattern heads): the set 
H CQ \ Qn of states in Aj41 with outgoing new transitions. 

3. For each pattern head q' € H, compute the relevant sets nj C ôn and 
Quy © Qn of new transitions and states: the members of dy and Qw that 
are reachable from q’ without passing through any existing transitions. 

4. For each q' € H, restrict to Qnjq and ĝnjg and compute gx and p as before. 


If Aji1’s new patterns have no overlap and do not create an ambiguity around 
join states, then they may be handled independently and in arbitrary order. They 
are used to discover rules and then enabled, as in the original algorithm. 

Simultaneous but dependent rule applications — such as inserting a pattern 
and then grafting another onto its join state — are more difficult to handle, as it is 
not always possible to determine which pattern was grafted onto which. However, 
there is a special case which appeared in several of our experiments (examples 
L13 ad L14 of Section 7) for which we developed a technique as follows. 

Suppose we discover a rule rı : po >s (pı © pr)= p and p contains a cycle c 
around some internal state q;. If later another rule inserts a pattern pn at the 
state qj, we understand that p is in fact a composite pattern, with p = pı O po 
and join state q;. However, as patterns do not contain cycles at their edge states, 
c cannot be a part of either pı or p2. We conclude that the addition of p was 
in fact a simultaneous application of two rules: r| : po >s (pı © pr)= p’ and 
r2 : p! >e (pı © p2)= c, where p’ is p without the cycle c, and update our PRS 
and our DFAs’ enabled pattern instances accordingly. The case when p is circular 
and rı is of rule type (2) is handled similarly. 


Slow initiation Ideally, A, directly supplies an initial rule L —> p? to our PRS. 
In practice, the first few DFAs generated by L* have almost random structure. 
We solve this by leaving discovery of the initial rules to the end of the algorithm, 
at which point we have a set of ‘valid’ patterns that we are sure are part of the 
PRS. From there we examine the last DFA An generated in the sequence, note all 
the enabled instances (p’, qo) at its initial state, and generate a rule L —> p? for 
each of them. This technique has the weakness that it will not recognise patterns 
p? that do not also appear as extending patterns p3 elsewhere in the sequence, 
unless the threshold for patterns is minimal. 


6 Converting a PRS toa CFG 


We present an algorithm to convert a given PRS to a context free grammar 
(CFG), making the rules extracted by our algorithm more accessible. 


A restriction: Let P = (X, P, P}, R) be a PRS. For simplicity, we restrict the 
PRS so that every pattern p can only appear on the LHS of rules of type (2) or 
only on the LHS of rules of type (3) but cannot only appear on the LHS of both 
types of rules. Similarly, we assume that for each rule L—> pz, the RHS patterns 
pr are all circular or non-circular. This restriction is natural: all of the examples 
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in Sections 4.1 and 7.3 conform to it. Still, in [27] we show how to remove this 
restriction. 

We create a CFG G = (X, N,S, Prod). X is the same alphabet of P and 
we take S as a special start symbol. For every pattern p € P, let Gp = 
(Xp, Np, Zp, Prod) be a CFG describing L(p). Let Py C Po be those com- 
posite patterns that appear on the LHS of a rule of type (2). Create the non- 
terminal C's and for each p € Py, create an additional non-terminal Cp. We set 


N ={S,Cs} UiNe} y {Cp}. 


Let L —> pr be a rule in P. If pz is non-circular, create a production S$ ::= Zp;- 
If py is circular, create the productions S ::= Sc, Sc := Sco Sc and Sc ::= Zp, 
For each rule p >, (pı © p2)œ ps create a production Zp ::= Zp, Zp; pa. For each 
rule p >c (pı © p2)& p3 create productions Zp ::= Zp,CpZp,, Cp i= CpCp, and 
Cp = Zp,. Let Prod’ be the all the productions defined by the above process. 


We set Prod = { |) Prod,}U Prod’. 
pEP 


Theorem 2. Let G and P be as above. Then L(P) = L(G). 


The proof is given in the extended version of this paper [27]. 


Expressibility Every RE-Dyck language (Section 2.2) can be expressed by a PRS, 
but the converse is not true; RE-Dyck languages nest delimiters arbitrarily, while 
PRS grammars may not. For instance, language L12 of Section 7.3 is not a Dyck 
language. Meanwhile, not every CFL can be expressed by a PRS [27]. 


Succinctness The construction above does not necessarily yield a minimal CFG 
G. For a PRS defining the Dyck language of order 2 — which can be expressed by 
a CFG with 4 productions and 1 non-terminal — our construction yields a CFG 
with 10 non-terminals and 12 productions. In this case, and often in others, we 
can recognise and remove the spurious productions from the generated grammar. 


7 Experimental results 


7.1 Methodology 


We test the algorithm on several PRS-expressible context free languages, attempt- 
ing to extract them from trained RNNs using the process outlined in Figure 1. 
For each language, we create a probabilistic CFG generating it, train an RNN 
on samples from this grammar, extract a sequence of DFAs from the RNN, and 
apply our PRS inference algorithm. Finally, we convert the extracted PRS back 
to a CFG, and compare it to our target CFG. 

In all of our experiments, we use a vote-threshold s.t. patterns with less than 
2 votes are not used to form any PRS rules (Section 5.1). Using no threshold 
significantly degraded the results by including too much noise, while higher 
thresholds often caused us to overlook correct patterns and rules. 
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7.2 Generating a sequence of DFAs 


We obtain a sequence of DFAs for a given CFG using only positive samples[11,1] by 
training a language-model RNN (LM-RNN) on these samples and then extracting 
DFAs from it with the aid of the L* algorithm [2], as described in [26]. To apply 
L* we must treat the LM-RNN as a binary classifier. We set an ‘acceptance 
threshold’ t and define the RNN’s language as the set of sequences s satisfying: 
1. the RNN’s probability for an end-of-sequence token after s is greater than t, 
and 2. at no point during s does the RNN pass through a token with probability 
< t. This is identical to the concept of locally t-truncated support defined in [13]. 

To create the samples for the RNNs, we write a weighted version of the CFG, 
in which each non-terminal is given a probability over its rules. We then take 
N samples from the weighted CFG according to its distribution, split them into 
train and validation sets, and train an RNN on the train set until the validation 
loss stops improving. In our experiments, we used N = 10,000. For our languages, 
we used very small 2-layer LSTMs: hidden dimension 10 and input dimension 4. 

In some cases, especially when all of the patterns in the rules are several 
tokens long, the extraction of [26] terminates too soon: neither L* nor the RNN 
abstraction consider long sequences, and equivalence is reached between the 
L* hypothesis and the RNN abstraction despite neither being equivalent to the 
‘true’ language of the RNN. In these cases we push the extraction a little further 
using two methods: first, if the RNN abstraction contains only a single state, 
we make an arbitrary initial refinement by splitting 10 hidden dimensions, and 
restart the extraction. If this is also not enough, we sample the RNN according 
to its distribution, in the hope of finding a counterexample to return to L* . The 
latter approach is not ideal: sampling the RNN may return very long sequences, 
effectively increasing the next DFA by many rule applications. We place a time 
limit of 1,000 seconds (~ 17 minutes) on the extraction. 


7.3 Languages 
We experiment on 15 PRS-expressible languages Lı — L15, grouped into 3 classes: 


1. Languages of the form X”Y”, for various regular expressions X and Y. In 
particular, the languages Lı through Le are XP Yẹ for: (X1,Y1)=(a,b), 

(Xo, Y2)=(alb,cld), (X3,Y3)=(ablcd,ef|lgh), (X4,Y4)=(ab, cd), 
(X5,Y5)=(abc, def), and (Xg,Yg)=(ablc,delf). 

2. Dyck and RE-Dyck languages. In particular, languages Ly through Lg are 
the Dyck languages of order 2 through 4, and Lio and Ly, are RE-Dyck 
languages of order 1 with the delimiters (Lj9,Ri9)=(abcde,vwxyz) and 
(Lii »Ri1)=(ab|c,delf). 

3. Variations of the Dyck languages. Ly. is the language of alternating single- 
nested delimiters, generating only sequences of the sort ([([])]) or [(0)]. 
Lız and Ly4 are Dyck-1 and Dyck-2 with additional neutral tokens a,b,c 
that may appear multiple times anywhere in the sequence. Lis is like Ly3 
except that the neutral additions are the token d and the sequence abc, eg: 
(abc() ())d is in Lis, but a(bc() ())d is not. 
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LG|DFAs) Init |Final)Min/Max}| CFG ||LG/DFAs| Init |Final/Min/Max] CFG 

Pats] Pats} Votes | Correct Pats] Pats| Votes | Correct 
Lı| 18 | 1 1 16/16 | Correct || Lo] 30 | 6 4 5/8 Correct 
L2| 16 1 1 14/14 | Correct ||Lio) 6 2 1 3/3 Correct 
L3| 14 6 4 2/4 Incorrect||£41|} 24 6 3 5/12 |Incorrect 
La] 8 2 1 5/5 Correct |/Zi2] 28 | 2 2 13/13 | Correct 
Ls| 10 2 1 7/7 Correct || Lis) 9 6 1 2/2 Correct 
Le6| 22 9 4 3/16 |Incorrect]|Li4| 17 5 2 5/7 Correct 
L| 24 2 2 11/11 | Correct ||Lı5! 13 6 4 3/6 Incorrect 
Lg} 22 | 5 4 2/9 Partial 


Table 1. Results of experiments on DFAs extracted from RNNs 


7.4 Results 


Table 1 shows the results. The 2nd column shows the number of DFAs extracted 
from the RNN. The 3rd and 4th columns present the number of patterns found 
by the algorithm before and after applying vote-thresholding to remove noise. 
The 5th column gives the minimum and maximum votes received by the final 
patterns (we count only patterns introduced as a new pattern pë in some A;41). 
The 6th column notes whether the algorithm found a correct CFG, according 
to our manual inspection. For languages where our algorithm only missed or 
included 1 or 2 valid/invalid productions, we label it as partially correct. 


Alternating Patterns Our algorithm struggled on the languages L3, Le, and 
£4,, which contained patterns whose regular expressions had alternations (such 
as ablcd in Ls, and ablc in Lg and L11). Investigating their DFA sequences 
uncovered the that the L* extraction had ‘split’ the alternating expressions, 
adding their parts to the DFAs over multiple iterations. For example, in the 
sequence generated for L3, ef appeared in Ay without gh alongside it. The next 
DFA corrected this mistake but the inference algorithm could not piece together 
these two separate steps into a single rule. It will be valuable to expand the 
algorithm to these cases. 


Simultaneous Applications Originally our algorithm failed to accurately generate 
Lı and L44 due to simultaneous rule applications. However, using the technique 
described in Section 5.1 we were able to correctly infer these grammars. However, 
more work is needed to handle simultaneous rule applications in general. 

Additionally, sometimes a very large counterexample was returned to L* , 
creating a large increase in the DFAs: the 9* iteration of the extraction on 
Ls introduced almost 30 new states. The algorithm does not manage to infer 
anything meaningful from these nested, simultaneous applications. 


Missing Rules For the Dyck languages L7— Lo, the inference algorithm was mostly 
successful. However, due to the large number of possible delimiter combinations, 
some patterns and nesting relations did not appear often enough in the DFA 
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sequences. As a result, for Lg, some productions were missing in the generated 
grammar. Lg also created one incorrect production due to noise in the sequence 
(one erroneous pattern was generated two times,passing the threshold). 


RNN Noise In Lis, the extracted DFAs for some reason always forced that a 
single character d be included between every pair of delimiters. Our inference 
algorithm of course maintained this peculiarity. It correctly allowed the allowed 
optional embedding of “abc” strings. But due to noisy (incorrect) generated 
DFAs, the patterns generated did not maintain balanced parenthesis. 


8 Related work 


Training RNNs to recognize Dyck Grammars. Recently there has been a surge 
of interest in whether RNNs can learn Dyck languages [5,19,21,28]. While these 
works report very good results on learning the language for sentences of similar 
distance and depth as the training set, with the exception of [21], they report 
significantly lower accuracy for out-of-sample sentences. 

Among these, Sennhauser and Berwick [19] use LSTMs, and show that in 
order to keep the error rate within a 5 percent tolerance, the number of hidden 
units must grow exponentially with the distance or depth of the sequences 
(though Hewitt et. al. [13] find much lower theoretical bounds). They conclude 
that LSTMs do not learn rules, but rather statistical approximations. Bernardy 
[5] experimented with various RNN architectures, finding in particular that the 
LSTM has more difficulty in predicting closing delimiters in the middle of a 
sentence than at the end. Based on this, he conjectures that the RNN is using 
a counting mechanism, but has not truly learnt the Dyck language (its CFG). 
For the simplified task of predicting only the final closing delimiter of a legal 
sequence, Skachkova, Trost and Klakow [21] find that LSTMs have nearly perfect 
accuracy across words with large distances and embedded depth. 

Yu, Vu and Kuhn [28] compare the three works above, and note that the task 
of predicting only the closing bracket of a balanced Dyck word is not sufficient 
for checking if an RNN has learnt the language, as it can be computed by only a 
counter. In their experiments, they present a prefix of a Dyck word and train 
the RNN to predict the next valid closing bracket. They experiment with an 
LSTM using 4 different models, and show that the generator-attention model 
[17] performs the best, and is able to generalize quite well at the tagging task . 
However, they find that it degrades rapidly with out-of-domain tests. They also 
conclude that RNNs do not really learn the Dyck language. These experimental 
results are reinforced by the theoretical work in [13], who remark that no finite 
precision RNN can learn a Dyck language of unbounded depth, and give precise 
bounds on the memory required to learn a Dyck language of bounded depth. 

Despite these findings, our algorithm nevertheless extracts a CFG from a 
trained RNN, discovering rules based on DFAs synthesized from the RNN using 
the algorithm in [26]. Because we can use a short sequence of DFAs to extract 
the rules, and because the first DFAs in the sequence describe Dyck words with 
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increasing but limited distance and depth, we are often able to extract the 
CFG perfectly even when the RNN does not generalize well. Moreover, we show 
that our approach works with more complex types of delimiters, and on Dyck 
languages with expressions between delimiters. 


Extracting DFAs from RNNs. There have been many approaches to extract higher 
level representations from a neural network (NN), both to facilitate comprehension 
and to verify correctness. One of the oldest approaches is to extract rules from 
a NN [24,12]. In particular, several works attempt to extract FSAs from RNNs 
[18,15,25]. We base our work on [26]. Its ability to generate sequences of DFAs 
providing increasingly better approximations of the CFL is critical to our method. 

There has been less research on extracting a CFG from an RNN. One exception 
is [23], where they develop a Neural Network Pushdown Automata (NNPDA) 
framework, a hybrid system augmenting an RNN with external stack memory. 
They show how to extract a push-down automaton from an NNPDA, however, 
their technique relies on the PDA-like structure of the inspected architecture. In 
contrast, we extract CFGs from RNNs without stack augmentation. 


Learning CFGs from samples. There is a wide body of work on learning CFGs 
from samples. An overview is given in [10] and a survey of work for grammatical 
inference applied to software engineering tasks can be found in [22]. 

Clark et. al. studies algorithms for learning CFLs given only positive examples 
[11]. In [7], Clark and Eyraud show how one can learn a subclass of CFLs called 
CF substitutable languages. There are many languages that can be expressed by a 
PRS but are not substitutable, such as x”b”. However, there are also substitutable 
languages that cannot be expressed by a PRS (waw® - see [27]). In [8], Clark, 
Eyraud and Habrard present Contextual Binary Feature Grammars. However, 
it does not include Dyck languages of arbitrary order. None of these techniques 
deal with noise in the data, essential to learning a language from an RNN. 


9 Future Directions 


Currently, for each experiment, we train the RNN on that language and then 
apply the PRS inference algorithm on a single DFA sequence generated from that 
RNN. Perhaps the most substantial improvement we can make is to extend our 
technique to learn from multiple DFA sequences. We can train multiple RNNs 
and generate DFA sequences for each one. We can then run the PRS inference 
algorithm on each of these sequences, and generate a CFG based upon rules 
that are found in a significant number of the runs. This would require care to 
guarantee that the final rules form a cohesive CFG. It would also address the 
issue that not all rules are expressed in a single DFA sequence, and that some 
grammars may have rules that are executed only once per word of the language. 

Our work generates CFGs for generalized Dyck languages, but it is possible 
to generalize PRSs to express a greater range of languages. Work will then be 
needed to extend the PRS inference algorithm. 
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Abstract. We introduce an automated, formal, counterexample-based 
approach to synthesise Barrier Certificates (BC) for the safety verification 
of continuous and hybrid dynamical models. The approach is underpinned 
by an inductive framework: this is structured as a sequential loop between 
a learner, which manipulates a candidate BC structured as a neural 
network, and a sound verifier, which either certifies the candidate’s 
validity or generates counter-examples to further guide the learner. We 
compare the approach against state-of-the-art techniques, over polynomial 
and non-polynomial dynamical models: the outcomes show that we can 
synthesise sound BCs up to two orders of magnitude faster, with in 
particular a stark speedup on the verification engine (up to three orders 
less), whilst needing a far smaller data set (up to three orders less) for 
the learning part. Beyond improvements over the state of the art, we 
further challenge the new approach on a hybrid dynamical model and on 
larger-dimensional models, and showcase the numerical robustness of our 
algorithms and codebase. 


1 Introduction 


Barrier Certificates (BC) are an effective and powerful technique to prove safety 
properties on models of continuous dynamical systems, as well as hybrid models 
(featuring both continuous and discrete states) [21,22]. Whenever found, a BC 
partitions the state space of the model into two parts, ensuring that all trajectories 
starting from a given initial set, located within one side of the BC, cannot reach 
a given set of states (deemed to be unsafe), located on the other side. Thus a 
successful synthesis of a BC (which is in general not a unique object) represents 
a formal proof of safety for the dynamical model. BC find various applications 
spanning robotics, multi-agent systems, and biology [7,32]. 

This work addresses the safety of dynamical systems modelled in general by 
non-linear ordinary differential equations (ODE), and presents a novel method for 
the automated and formal synthesis of BC. The approach leverages Satisfiability 
Modulo Theory (SMT) and inductive reasoning (CEGIS, Figure 1, introduced 
later), to guarantee the correctness of the automated synthesis procedure: this 
rules out both algorithmic and numerical errors related to BC synthesis [10]. 


© The Author(s) 2021 
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Background and Related Work A few techniques have been developed to 
synthesise BC. For polynomial models, sum-of-squares (SOS) and semi-definite 
programming relaxations [14,16,29] convert the BC synthesis problem into con- 
straints expressed as linear or bilinear matrix inequalities: these are numerically 
solved as a convex optimisation problem, however unsoundly. To increase scalabil- 
ity and to enhance expressiveness, numerous barrier formats have been considered: 
BC based on exponential conditions are presented in [14]; BC based on Darboux 
polynomials are outlined in [33]; [30] newly introduces a multi-dimensional gen- 
eralisation of BC, thus broadening their scope and applicability. BC can also be 
used to verify safety of uncertain (e.g. parametric) models [20]. Let us remark 
that SOS approaches are typically unsound, namely they rely on iterative and 
numerical methods to synthesise the BC. [10] a-posteriori verifies SOS candidates 
via computer-aided design (CAD) techniques [15]. 

Model invariants (namely, regions that provably contain model trajectories, 
such as basins of attractions [28]) can be employed as BC, though their synthesis 
is less general, as it does not comprise an unsafe set and tacitly presupposes the 
initial set to be “well placed” within the state space (that is, within the aforemen- 
tioned basin): [19] introduces a fixpoint algorithm to find algebraic-differential 
invariants for hybrid models; invariants can be characterised analytically [4] or 
synthesised computationally [8]. Invariants can be alternatively studied by Lya- 
punov theory [5], which provides stability guarantees for dynamical models, and 
thus can characterise invariants (and barriers) as side products: however this again 
requires that initial conditions are positioned around stable equilibria, and does 
not explicitly encompass unsafe sets in the synthesis. Whilst Lyapunov theory is 
classically approached either analytically (explicit synthesis) or numerically (with 
unsound techniques), an approach that is relevant for the results of this work 
looks at automated and sound Lyapunov function synthesis: in [27] Lyapunov 
functions are soundly found within parametric templates, by constructing a sys- 
tem of linear inequality constraints over unknown coefficients. [23,24,25] employ a 
counterexample-based approach to synthesise control Lyapunov functions, which 
inspires this work, using a combination of SMT solvers and convex optimisation 
engines: however unlike this work, SMT solvers are never used for verification, 
which is instead handled by solving optimisation problems that are numerically 
unsound. As argued above, let us emphasise again that the BC synthesis problem, 
as studied in this work, cannot in general be reduced to a problem of Lyapunov 
stability analysis, and is indeed more general. 


candidate BC 


mn~ 
Learner Verifier valid B 
NN SMT 
“oo 


counter-example 


Fig. 1. Schematic representation of the CEGIS loop. 
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Core approach We introduce a method that efficiently exploits machine 
learning, whilst guaranteeing formal proofs of correctness via SMT. We leverage 
a CounterExample-Guided Inductive Synthesis (CEGIS) procedure [31], which is 
structured as an inductive loop between a Learner and a Verifier (cf. Fig. 1). 
A learner numerically (and unsoundly) trains a neural network (NN) to fit over 
a finite set of samples the requirements for a BC, which are expressed through 
a loss function; then a verifier either formally proves the validity of the BC or 
provides (a) counter-example(s) through an SMT solver: the counter-examples 
indicate where the barrier conditions are violated, and are passed back to the 
learner for further training. This synthesis method for neural BC is formally 
sound and fully automated, and thanks to its specific new features, is shown to 
be much faster and to clearly require less data than state-of-the-art results. 


Contributions beyond the State of the Art Cognate work [34] presents a 
method to compute BC using neural networks and to verify their correctness 
a-posteriori: as such, it does not generate counter-examples within an inductive 
loop, as in this work. [34] considers large sample sets that are randomly divided 
into batches and fed to a feed-forward NN; the verification at the end of the 
(rather long) training either validates the candidate, or invalidates it and the 
training starts anew on the same dataset. In Section 4 the method in [34] is shown 
to be slower (both in the training and in the verification), and to require more 
data than the CEGIS-based approach of this work, which furthermore introduces 
numerous bespoke optimisations, as outlined in Section 3: our CEGIS-based 
technique exploits fast learning, verification simplified by the candidates passed 
by the Learner, and an enhanced communication between Learner and Verifier. 
Our approach further showcases numerical robustness and scalability features. 
Related to the work on BC is the synthesis of Lyapunov functions, mentioned 
above. The construction of Lyapunov Neural Networks (LNNs) has been studied 
with approaches based on simulations and numerical optimisation, which are in 
general unsound [26]. Formal methods for Lyapunov synthesis are introduced in 
[5], together with a counterexample-based approach using polynomial candidates. 
The work is later extended in [2], which employs NN as candidates over poly- 
nomial dynamical models. The generation of control Lyapunov functions using 
counterexample-based NN is similarly considered in [9], however this is done 
by means of differing architectural details and does not extend to BC synthesis. 
Beyond the work in [5], this contribution is not limited to a specific polynomial 
template, since it supports more general mixtures of polynomial functions ob- 
tained through the NN structure, as well as the canonical tanh, sigmoid, ReLU 
activations (we provide one example of BC using tanh activations). Compared to 
[5], where we use LP programming to synthesise Lyapunov functions, in this work: 
a) we use a template-free procedure, thanks to the integration of NNs - these 
are needed since template-based SOS-programming approaches are not sufficient 
to provide BCs for several of the presented benchmarks (see Section 4 and [34]); 
b) we provide an enhanced loss function (naturally absent from [5]), enriched 
counter-example generation, prioritised check of the verification constraints, and 
c) we newly synthesise verified barrier certificates for hybrid models, which are 
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generated using counterexample-based, neural architectures. Finally, beyond [5] 
the new approach is endowed with numerical robustness features. 

SOS programming solutions [14,16,29] are not quite comparable to this work. 
Foremostly, they are not sound, i.e. do not offer a formal guarantee of numerical 
and algorithmic correctness. The exception is [10], which verifies SOS candidates 
a-posteriori by means of CAD [15] techniques that are known not to scale 
well. Furthermore, they can be hardly embedded within a CEGIS loop - we 
experimentally show that SOS candidates are handled with difficulty by SMT 
solvers. Finally, they hardly cope with the experiments we have considered, as 
already observed in [34]. We instead use SMT solvers (Z3 [11] and dReal [13]) 
within CEGIS to provide sound outcomes based on NN candidates, proffering a 
new approach that synthesises and formally verifies candidate BCs altogether, 
with minimum effort from the user. 


Organisation The remainder of the paper is organised as follows: Section 2 
presents preliminary notions on BCs and outlines the problem. Section 3 describes 
the approach: training of the NN in Sec. 3.1 and verification in Sec. 3.2. Section 
4 presents case studies, Section 5 delineates future work. 


2 Safety Analysis with Barrier Certificates 


We address the safety verification of continuous-time dynamical models by de- 
signing barrier certificates (BC) over the continuous state space X of the model. 
We consider n-dimensional dynamical models described by 


ilt) = “ = f(x), (0) = zo € Xo CX, (1) 


where f : X —> R” is a continuous vector field, X C R” is an open set defining 
the state space of the system, and Xo represents the set of initial states. Given 
model (1) and an unsafe set X, C X, the safety verification problem concerns 
checking whether or not all trajectories of the model originating from Xo reach 
the unsafe region Xu. BC offer a sufficient condition asserting the safety of the 
model, namely when no trajectory enters the unsafe region. 


Definition 1. The Lie derivative of a continuously differentiable scalar function 
B:X > R, with respect to a vector field f, is defined as follows 


B(x) = VB(a)- f(x) = X =——* =} fila). (2) 
i=l i=l 
Intuitively, this derivative denotes the rate of change of function B along the 


model trajectories. 


Proposition 1 (Barrier Certificate for Safety Verification, [21]). Let the 
model in (1) and the sets X, Xo and X, be given. Suppose there exists a function 
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B : X > R that is differentiable with respect to its argument and satisfies the 
following conditions: 


B(x) < 0 Yx € Xo, B(x) >O0VrEe X,, Bir) <O0Vr EX s.t. B(x) =0, 
(3) 
then the safety of the model is guaranteed. That is, there is no trajectory of the 
model contained in X, starting from the initial set Xo, that ever enters set Xu. 


Consider a trajectory x(t) starting in £o € Xo and the evolution of B(a(t)) along 
this trajectory. Whilst the first of the three conditions guarantees that B(x) < 0, 
the last condition asserts that the value of B(a(t)) along a trajectory x(t) must 
decrease. Hence such a trajectory x(t) cannot enter the set Xu, where B(x) > 0 
(second condition), thus ensuring the safety of the model. 


3 Synthesis of Neural Barrier Certificates via Learning 
and Verification 


We introduce an automated and formal approach for the construction of barrier 
certificates (BC) that are expressed as feed-forward neural networks (NN). The 
procedure leverages CEGIS (see Fig. 1) [31], an automated and sound procedure 
for solving second-order logic synthesis problems, which comprises two interacting 
parts. The first component is a Learner, which provides candidate BC functions 
by training a NN over a finite set of sample inputs. The network is then translated 
into a logical formula in an appropriate theory, by evaluating it with symbolic 
inputs, instead of canonical floating point numbers. The details of this conversion 
are outlined in [2]. This encoded candidate is passed to the second component, 
a Verifier, which acts as an oracle: either it proves that the solution is valid, or 
it finds one (or more) instance (called a counter-example) where the candidate 
BC does not comply with required conditions. The verifier consists of an SMT 
solver [15], namely an algorithmic decision procedure that extends Boolean SAT 
problems to richer, more expressive theories, such as non-linear arithmetics. 
More precisely, the learner trains a NN composed of n input neurons (this 
matches the dimension of the model f), k hidden layers, and one output neuron 
(recall that B(x) is a scalar function): this NN candidate B is required to closely 
match the conditions in Eq. (3) over a discrete set of samples S, which is 
initialised randomly. The verifier checks whether the candidate B violates any 
of the conditions in Eq. (3) over the entire set X and, if so, produces one (or 
more, as in this work) counter-examples c. We add c to the samples set S as the 
loop restarts, hence forcing the NN to be trained also over the generated counter- 
examples c. Note that the NN retains its old weights, and restarts the training 
from the weights obtained at the end of the previous session. This loop repeats 
until the SMT verifier proves that no counter-examples exist or until a timeout is 
reached. CEGIS offers a scalable and flexible alternative for BC synthesis: on the 
one hand, the learner does not require soundness, and ensures a rapid synthesis 
exploiting the training of NN architectures; on the other, the algorithm is sound, 
i.e. a valid output from the SMT-based verifier is provably correct; of course we 
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cannot claim any completeness, since CEGIS might in general not terminate with 
a solution because it operates over a continuous model. 

The performance of the CEGIS algorithm in practice hinges on the effective 
exchange of information between the learner and the verifier [3]. A core contri- 
bution of this work is to tailor the CEGIS architecture to the problem of BC 
synthesis: we devise several improvements to NN training, such as a bespoke loss 
function and a multi-layer NN architecture that ensures robustness and outputs 
a function that is tailored to the verification engine. Over consecutive loops, the 
verifier may return similar counter-examples: we thus propose a more informative 
counter-examples generation by the SMT verifier that is adapted to the candidate 
BC and the underlying dynamical model. These tailored architectural details 
generate in practice a rapid, efficient, and robust CEGIS loop, which is shown in 
this work to clearly outperform state-of-the-art methods. 


3.1 Training of the Barrier Neural Network 


The learner instantiates the candidate BC using the hyper-parameters k and h 
(depth and width of the NN), trains it over the N samples in the set S, and later 
refines its training whenever the verifier adds counter-examples to the set S. The 
class of candidate BC comprises multi-layered, feed-forward NN with polynomial 
and non-polynomial activation functions. Unlike most learning applications, the 
choice of polynomial activations comes from the need for interpretable outputs 
from the NN, whose analytical expression must be readily processed by the 
verifier. The order y of the polynomial activations is a hyper-parameter fed at 
the start of the procedure: we split the i-th hidden layer into y portions and 
apply polynomial activations of order j to the neurons of the j-th portion. 


Example 1 (Polynomial Activations). Assume a NN composed of an input 2, 
3 hidden neurons and 1 activation-free output, with y-th order polynomial 
activation, y = 3. We split the hidden layer in y sub-vectors, each containing one 
neuron. The hidden layer after the activation results in 


(1) (1) a) = 
z= [wi z+ bı (W3 £ + be)? (W3 a + bs)” 


where the wi) are the i-th row of the first-layer weight matrix, and the b; form 
the bias vector. 


The learning process updates the NN parameters to improve the satisfaction of 
the BC conditions in (3): B(x) < 0 for x € Xo, B(x) > 0 for x € Xu, and a 
negative Lie derivative B (Eq. (2)) over the set implicitly defined by B(x) = 0. 
The training minimises a loss comprising three terms, namely 


N 
1 
L=Lo+lu+la= 7 > ( max {r, B(s:)} + max {7u, —B(si)} 


sicXo 


J a B(s,)}) » (4) 
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where s;,7 = 1,...,N are the samples taken from the set S. The constants 
To, Tu, Ta are offsets, added to improve the numerical stability of the training. 
Notably, B(x) = 0 can be a set with small volume, thus it is highly unlikely 
that a single sample s will satisfy B(s) = 0. We thus relax this last condition 
and consider a belt B around B(s) = 0, namely B = |B(x)| < 6, which depends 
on the hyper-parameter 2. Note that we must use continuously differentiable 
activations throughout, as we require the existence of Lie derivatives (cf. Eq. (2)), 
and thus cannot leverage simple ReLUs. 


Enhanced Loss Functions The loss function in Eq. (4) experimentally yields 
possible drawbacks, which suggests a few ameliorations. Terms Lo and La solely 
penalise samples with incorrect value of B(x) without further providing a reward 
for samples with a correct value. The NN thus stops learning when the samples 
return correct values of B(x) without further increasing the positivity of B over 
Xu or the negativity over Xo. As such, the training often returns a candidate 
B(x) with values just below 7) in Xo or above 7, in Xu. These candidates are 
easily falsified, thus potentially leading to a large number of CEGIS iterations. 
We improve the learning by adopting a (saturated) Leaky ReLU, hence 
rewarding samples that evaluate to a correct value of B(x). Noting that 


LeakyReLU(a, x) = ReLU (x) — a ReLU(—z), (5) 
where a is a small positive constant, we rewrite term Lo as 
1 
lo= XO ReLU(B(si) — To) — a: satReLU(—B(s;) + 70), (6) 
8i€Xo 


where satReLU is the saturated ReLU function®. The term Lẹ is similarly modi- 
fied. The composite loss function works as follows. Incorrect samples account for 
the main contribution to the loss function, leading the NN to correct those first 
via the ReLU term in Eq. (6). At a second stage, the network finds a direction of 
improvement by following the leaky portion of the loss function. This is saturated 
to prevent the training from following only one of these directions, without 
improving the other loss terms. 

Another possible drawback of the loss function in (4) derives from the term La: 
it solely accounts for a penalisation of the sample points within B. To quickly and 
myopically improve the loss function, the training can generate a candidate BC 
for which no samples are within $ - we experimentally find that this behaviour 
persists, regardless of the value of 3. Similarly to Lo and Lu, we reward the points 
within a belt fulfilling the BC condition: namely, we solely apply the satReLU 
function to reward samples s with a negative B(s), whilst not penalising values 
B(s) > 0. The training is driven to include more samples in B, guiding towards 
a negative B(s), and finally enhancing learning. The expression of Lg results in 


La= a 5 satReLU(—B(s) + Ta). (7) 
seEB 


3 Let us define M to be an arbitrary upper bound, then 
satReLU(x) = min(max(0,2), M). 
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Finally, we choose an asymmetric belt B = —8, < B(s) < b2, with By > 6, > 0 
to both ensure a wider sample set and a stronger safety certificate. 


Multi-layer Networks Polynomial activation functions generate interpretable 
barrier certificates with analytical expressions that are readily verifiable by an 
SMT solver. However, when considering polynomial networks, the use of multi- 
layer architectures quickly increases the order of the barrier function: a k-layer 
network with y-th order activations returns a polynomial of ky degree. We have 
experienced that deep NN provide numerical robustness to our method, although 
the verification complexity increases with the order of the polynomial activation 
functions used and with the depth of the NN. As a consequence, our procedure 
leverages a deep architecture whilst maintaining a low-order polynomial by 
interchanging linear and polynomial activations over adjacent layers. We have 
observed that the use of linear activations, particularly in the output layer, 
positively affects the training: they provide robustness that is needed to the 
synthesis of BC (see Experimental results), without increasing the order of the 
network with new polynomial terms. 


Learning in Separate Batches The structure of the conditions in (3) and the 
learning loss in (4) naturally suggests a separate approach to training. We then 
split the dataset S into three batches Sp, Su and S,, each including samples 
belonging to Xo, X, and X, respectively. For training, we compute the loss 
function in a parallel fashion. Similarly, for the verifier, generated counter- 
examples are added to the relevant batch. 


3.2 Certification of the Barrier Neural Network, or Falsification via 
Counter-examples 


Every candidate BC function B(x) which the learner generates requires to be 
certified by the verifier. Equivalently, in practice the SMT-based verifier aims at 
finding states that violate the barrier conditions in (3) over the continuous domain 
X. To this end, we express the negation of such requirements, and formulate a 
nonlinear constrained problem over real numbers, as 


(a € Xo A B(x) > 0) V (£ € Xu A B(x) < 0) V (B(x) =0A B(x) > 0). (8) 


The verifier searches for solutions of the constraints in Eq. (8), which in general 
requires manipulating non-convex functions. This can be cumbersome and time- 
consuming, hence simple expressions of B can enhance the verification procedure. 
On the one hand, the soundness of our CEGIS procedure heavily relies on the 
correctness of SMT solving: an SMT solver never fails to assert the absence of 
solutions for (8). As a result, when it states that formula (8) is unsatisfiable, i.e. 
returns unsat, B(x) is formally guaranteed to fulfil the BC conditions in Eq. 
(3). On the other hand, the CEGIS algorithm offers flexibility in the choice of 
the verifier, hence we implement and discuss two SMT solvers: dReal [13] and 
Z3 [11]. dReal is a 6-complete solver, namely the unsat decision is correct [12], 
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whereas when a solution for (8) is found, this comes with a -error bound. The 
value of 6 characterises the procedure precision. In our setting, it is acceptable 
to return spurious counter-examples: indeed, these are then used as additional 
samples and do not invalidate the sound outcomes of the procedure, but rather 
help synthesising a more robust barrier candidate. dReal is capable of handling 
non-polynomial terms, such as exponentials or trigonometric vector fields f for 
some of the models considered in Section 4. Z3 is a powerful, sound and complete 
SMT solver, namely its conclusions are provably correct both when it determines 
the validity of a BC candidate and when it provides counter-examples. The 
shortcoming of Z3 is that it is unable to handle non-polynomial formulae. 


Prioritisation and Relaxation of Constraints The effectiveness of the 
CEGIS framework is underpinned by rapid exchanges between the learner and 
the verifier, as well as by quick NN training and SMT verification procedures. 
We have experienced that the bottleneck resides in the handling of the constraint 
na = (B(x) = 0 A B(x) > 0) by the SMT solver, since the formula contains the 
high-order expression B(x) and because it is defined over the thin region of the 
state space implicitly characterised by B(x) = 0. As a consequence, we have 
prioritised constraints no = (x € Xo A B(x) > 0) and m, = (x € Xu A B(x) < 0): 
that is, if either clauses is satisfied, i.e. a counter-example is found for at least one 
of them, the verifier omits testing ņa whilst the obtained counter-examples are 
passed to the learner. The constraint na is thus checked solely if no and nu are both 
deemed to be unsat. Whenever this occurs, and the verification of na times out, 
the solver searches for a solution of a relaxed constraint (|B(x)| < T, A B(x) > 0), 
similarly to the improved learning conditions discussed in Eq. (7). Whilst this 
constraint is arguably easier to solve in general, it may generate spurious counter- 
examples, namely a sample g that satisfy the relaxed constraint, but such that 
B(&) #0. The generation of these samples does not contradict the soundness of 
the procedure, and indeed improve the robustness of the next candidate BC — 
this of course comes with the cost of increasing the number of CEGIS iterations. 


Increased Information from Counter-examples The verification task 
encompasses an SMT solver attempting to generate a counter-example, namely 
a (single) instance satisfying Eq. (8). However, a lone sample might not always 
provide insightful information for the learner to process. Naively asking the SMT 
solver to generate more than one counter-example can be in general expensive. 
Specifically, the verifier solves Eq. (8) to find a first counter-example Z; then, to 
find any additional sample, we include the statement (x 4 z) and solve again for 
the resulting formula. We are interested in finding numerous points invalidating 
the BC conditions and feed them to the learner as a batch, or in increasing 
the information generated by the verifier by finding a sample that maximises 
the violation of the BC conditions. To this end, firstly we randomly generate a 
cloud of points around the generated counter-example: in view of the continuity 
of the candidate function B, samples around a counter-example are also likely 
to invalidate the BC conditions. Secondly, for the original counter-example, we 
compute the gradient of B (or of B) and follow the direction that maximises the 
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violation of the BC constraints. As such, we follow the B (resp. B) maximisation 
when considering x € Xo (a s.t. |B(x)| < 7), and vice versa when x € Xa. This 
gradient computation is extremely fast as it exploits the neural architecture, and 
it provides more informative samples for further use by the learner. 


Algorithm 1 Synthesis of Neural Barrier Certificate 


function LEARNER(S, f) 


repeat function CEGIS(f) 
B(S) + NN(S) initialise NN, S 
B(S) + VB(S)- f(S) repeat 
compute loss L, update NN NN + LEARNER(S, f) 
until convergence B(x), B(x) + Translate(NN, f) 
return NN Cex or unsat + VERIFIER(B, B) 
end function S + S U Cex 
until unsat 
function VERIFIER(B, B) return B(x), B(x) 
encode conditions in (8) end function 


Cex or unsat + SMTcheck(B, B) 
return Cex or unsat 
end function 


4 Case Studies and Experimental Results 


All experiments are performed on a laptop workstation with 8 GB RAM, running 
on Ubuntu 18.04. We demonstrate that the proposed method finds provably 
correct BCs on benchmarks from literature comprising both polynomial and 
non-polynomial dynamics: we compare our approach against the work [34], as 
this is the only work on sound synthesis of BCs with NNs to the best of our 
knowledge, and against the SOS optimisation software SOSTOOLS [18]. Beyond 
the benchmarks proposed in [34], we newly tackle a hybrid model as well as 
larger, (up to) 8-dimensional models, which push the boundaries of the verification 
engine and display a significant extension to the state of the art. To confirm 
the flexibility of our architecture, we employ SMT-solver dReal in the first four 
benchmarks, whereas we study the last four using Z3. In all the examples, we 
use a learning rate of 0.1 for the NN and the loss function in Section 3.1 with 
a= 1074, To = Tu = Ta = 0.1. The region in Eq. (7) is limited by 8; = 0.1, whilst 
Bz = oo. Accordingly, the training over a large set 6 results in a candidate B 
with a negative derivative over this large region, which validity is more likely to 
be certified by the verifier. We set a verification parameter 7, = 0.05 (cf. Sec. 
3.2), a timeout (later denoted as OOT) of 60 seconds and the precision for dReal 
to 6 = 10~®. Table 1 summarises the outcomes. We emphasise that our approach 
supports any network depth and width. The presented results seek a tradeoff 
between speed (low order, small networks) and expressiveness (high order, larger 
networks): a different architecture may result in a slower or faster synthesis. 
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For the first four benchmarks, we compare our procedure, denoted as CEGIS, 
with the repeated results from [34], which however does not handle the hybrid 
model in the fifth benchmark. We have run the algorithm in [34] and reported 
the cumulative synthesis time under the ‘Learn’ column. However the verification 
is not included in the repeatability package, hence we report the results from 
[34], which are generated with much more powerful hardware. Due to this issue 
of lack of repeatability, we have not run [34] on the larger models. Compared to 
[34], the outcomes suggest that we obtain much faster synthesis and verification 
times, whilst requiring up to only 0.1% (see Obstacle Avoidance Problem) of the 
training data: [34] performs a uniform sampling of the space X, hence suffers 
especially in the 3-D case, where the learning runs two orders of magnitude faster. 
Evidently this gap in performance derives from the different synthesis procedure: 
it appears to be more advantageous to employ a smaller, randomly sampled 
initial dataset that is progressively augmented with counter-examples, rather 
than to uniformly sample the state space to then train the neural network. 


Next, we have implemented the SOS optimisation problems in [10] within the 
software SOSTOOLS [18] to generate barrier candidates, which are polynomials 
up to order 4 (this is the maximum order of the polynomial candidates generated 
by our Learner). In a few instances we ought to conservatively approximate the 
expression of Xo or Xu in order to encode them as SOS program - this makes 
their applicability less general. SOSTOOLS has successfully found BC candidates 
for five of the eight benchmarks, and they were generated consistently fast, in 
view of the convex structure of the underlying optimisation problem. However, 
recall that these techniques lack soundness (also due to numerical errors), which 
is instead a core asset of our approach. Consequently, we have passed them to 
the Z3 SMT solver, which should easily handle polynomial formulae: only one of 
them (‘Hybrid Model’) has been successfully verified; instead, the candidate for 
the ‘Polynomial Model’ has been invalidated (namely Z3 has found a counter- 
example for it), whereas the verification of the remaining BC candidates has 
run out of time. For the latter instances, we have experienced that SOSTOOLS 
generally returns numerically ill-conditioned expressions, namely candidates with 
coefficients of rather different magnitude, with many decimal digits: even after 
rounding, expressions with this structure are known to be hardly handled by 
SMT solvers [2,5], which results in long time needed to return an answer - this 
explains the experienced timeouts. These experiments suggest that the use of 
SOS programs within a CEGIS loop appears hardly attainable. 


Notice that all the case studies are solved with a small number of iterations 
(up to 9) of the CEGIS loop: this feature, along with the limited runtimes, is 
promising towards tackling synthesis problems over larger models. 


For the eight case studies, we report below the full expressions of the dynamics 
of the models, the spatial domain X (as a set of constrains), the set of initial 
conditions Xp C X, and the unsafe set Xu C X. We add a detailed analysis of 
the CEGIS iterations involved in the synthesis of the corresponding BCs. 
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Benchmark CEGIS (this work) BC from [34] SOS from [18] 
Learn Verify Samples Iters || Learn Verify Samples || Synth Verify 
Darboux 31.6 0.01 0.5k 2 54.9 20.8 65k x = 


Exponential || 15.9 0.07 1.5k 
Obstacle 55.5 1.83 2.0k 
Polynomial || 64.5 4.20 2.3k 


234.0 11.3 65k x E 
3165.3 1003.3 2097 k x 7 
1731.0 635.3 65k 8.10 x 


WwrRrRFnNo Ww 


Hybrid 0.58 2.01 0.5k = = — 12.30 0.11 
4-d ODE || 29.31 0.07 1k E -— — 12.90 OOT 
6-d ODE || 89.52 1.61 1k z = E 16.60 OOT 


8-d ODE 104.5 82.51 1k 3 = = 26.10 OOT 
Table 1. Outcomes of the case studies: Cumulative time for beatin and Verification 
steps are given in seconds; ‘Samples’ indicates the size of input data for the Learner (in 
thousands); ‘Iters’ is the number of iterations of the CEGIS loop (which is specific to 
our work); x indicates a synthesis or verification failure; OOT denotes a verification 
timeout. The Hybrid and the three ODE Models are newly introduced in this work. 


Darboux Model This 2-dimensional model is approached using polynomial 
BCs. Its analytical expression is 


í = 2 
ti are with domains Xo ={0<a<1,1<y< 3}, 


— eae: 

Y x+ 2x7 — y*, ipa or 
The work [33] reports that methods based on linear matrix inequalities fail to 
verify this model using polynomial templates of degree 6. Our approach generates 
the BC shown in Fig. 2 (left) in approximately 30 seconds, roughly half as much 
as in [34], and using only 500 initial samples vs more than 65000. The initial 
and unsafe sets are depicted in green and red, respectively, whereas the level set 
B(x) = 0 is outlined in black. The BC is derived from a single-layer architecture 
of 10 nodes, with linear activations. 


Exponential Model This model from [17] shows that our approach extends to 
non-polynomial systems encompassing exponential and trigonometric functions: 
be et ct 
f € TY- with domains Xo = {(w + 0.5)? + (y — 0.5)? < 0.16}, 
ý =—sin* x, 
Xa =4{(@ — 0.7)? + (y + 0.7)? < 0.09}. 


Our algorithm provides a valid BC in 16 seconds, around 7% of the results in [34], 
again using solely 1500 initial samples. The BC, depicted in Fig.2 (centre), results 
from a single-layer neural architecture of 10 nodes, with polynomial (y = 3) 
activation function. 


Obstacle Avoidance Problem This 3-dimensional model, originally presented 
in [6], describes a robotic application: the control of the angular velocity of a 
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Fig. 2. The BC for the Darboux (top left), Exponential (middle left), and Obstacle 
Avoidance (the 3-D study, bottom left) models with corresponding vector fields (right 
column). Initial and unsafe sets are represented in green and red, respectively; the black 
line outlines the level curve B(x) = 0. 


two-dimensional airplane, aimed at avoiding a still obstacle. The details are 
t = vsin p, 
Y = V cos Y, 

x sin Y + y cos p 


g=u, where u=-—sing+3- 05+ ry? 


, with domains 
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X= {-2 < T, Y < 2, —7/2 << 7/2}, 

Xo ={-0.1 < z < 0.1,—2 < y < —1.8, —7/6 < p < 7/6}, 

Xu = {x° +y? < 0.04}. 
The BC is obtained from a single-layer NN comprising 10 neurons, using (y = 3) 
polynomial activations. Fig. 2 (right) plots the vector field on the plane z = 0. 


Our procedure takes 1% of the computational time in [34], providing a valid BC 
with 9 iteration starting from an initial dataset of 2000 samples. 


Polynomial Model This model describes a polynomial system [22] and presents 
initial and unsafe sets with complex, non convex shapes [34], as follows: 


t=y, 
f- —x + 1/3 x? — y, with domains 
X ={-35<2<2,-2<y< 1}, 
Xo = {(x — 1.5)? +y? < 0.25 V (x > —1.8 Ax < —1.2 ^y > —0.1 Ay < 0.1) 
V(x > —1.4^xz < —1.2^y > —0.5 ^y < 0.1)}, 
Xu ={(@ +1)? + (y +1)? < 0.16 V (z >04A2<06Ay>01Ay < 0.5) 
V(a>04An<08Ay>O01Ay < 0.3)}. 


SOS-based procedures [16,29], have required high-order polynomial templates, 
which has suggested the use of alternative activation functions. The BC, shown 
in Fig. 3, is generated using a 10-neuron, two-layer NN with polynomial (y = 3) 
and tanh activations. Needing just around 1 min and only 2300 initial samples, 
the overall procedure is 30 times faster than that in [34]. 


Hybrid Model We challenge our procedure with a 2-dimensional hybrid model, 
which extends beyond the capability of the results in [34]. This hybrid framework 
partitions the set X into two non-overlapping subsets, X, and X2. Each subset 
is associated to different model dynamics, respectively f; and f2. In other words, 
the model trajectories evolve according to the fı dynamics when in X1, and 
according to f when in X2. 


= t=y, = t=y, 
A= f- -z — 0.523, f 1 oe 


with domain for fı = {(x,y) : x < 0}, domain for fo = {(x,y) : x > 0}, and sets 
X ={r° +y <4}, Xp ={(e+1)? + (y+1)? < 0.25}, 
Xu = {(x — 1)? + (y — 1} < 0.25}. 
The structure of this model represents a non-trivial task for the verification 
engine, for which we employ the Z3 SMT solver. The learning phase has instead 
been quite fast. The BC (Fig.3) is obtained from a single-layer NN comprising 3 


neurons, using polynomial activations with y = 2, overall in less than 3 seconds, 
starting with an initial dataset of 500 samples. 
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Fig. 3. The BC for the polynomial model (top left) and the hybrid model (top right) 
with the respective vector field (below). 


Larger-dimensional Models We finally challenge our procedure with three 
high-order ODEs, respectively of order four, six and eight, to display the general 
applicability of our counter-example guided BC synthesis. We consider dynamical 
models described by the following differential equations: 


ae) + 39802) + 41802?) + 24002 + 576 = 0, (9) 
ar + 300) + 227324 + 39802°) + 418027) + 24002 +576 =0, (10) 
av) + 202 +1702 + 8002) + 22734) 

+ 398029) + 41802) + 24002 + 576 = 0, (11) 


where we denote the i-th derivative of variable x by x. We translate the ODE 
into a state-space model with variables z1, ..., xj, where j = {4, 6, 8}, respectively. 
In all three instances, we select as spatial domain X an hyper-sphere centred at 
the origin of radius 4; an initial set Xo as hyper-sphere* centred at +1!) of radius 
0.25; an unsafe set X, as an hyper-sphere centred at —21] of radius 0.16. For 
the synthesis, we employ for all case studies a single-layer, 5-node architecture 
with polynomial (y = 1) activation function. Whilst in particular the verification 
engine is challenged from the high dimensionality of the models, the CEGIS 
procedure returns a valid barrier certificate in up to 3 iterations and with very 
reasonable run times. 


4 We denote 1! the point of a j-dimensional state-space that has all its components 
equal to 1. For instance, 1"! is the 3-dimensional point [1,1,1]. Similarly for 201, 
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Codebase Robustness The results in Table 1 are obtained setting the NN 
initialisation seed manually for repeatability. We now test the robustness of the 
overall algorithm by randomising the initialisation seed. We report in Table 2 
the percentage of successful runs, the average time and iterations count, along 
with minimum and maximum values, over 50 runs. We set timeouts as a max 
running time of 10 minutes, or as 12 CEGIS loops. Notice that small architectures 
are highly susceptible to initialisations, which renders this test rather challeng- 
ing. Compared to Table 1, we notice similar performances for the Darbouz, 
Exponential and Hybrid models, vouching for the robustness of our approach. 
However, the performance decreases when tackling the most challenging models. 
Still, we highlight that the procedure can synthesise a valid BC very rapidly for 
every benchmark (notice the lower bounds of the computational times). This 
outcome suggests that a parallel approach - i.e. the procedure running on several 
networks simultaneously - may be suited to quickly synthesise candidates. Overall, 
the table shows a high degree of variance, possibly indicating the need for larger 
architectures to enhance robustness. 


Benchmark Success [%] Iters Avg Time 
Darboux Model 84.0 | 4.76 [1, 12] | 75.33 [15.00, 189.25] 
Exponential Model 76.0 5.20 [1, 12] 9.50 [3.17, 21.59] 
Obstacle Avoidance! 28.0 [9.88 [1, 11] | 129.24 [16.17, 549.41] 


Polynomial Model 8.0 5.56 [5, 9] |335.32 [230.86, 377.91] 
Hybrid Model 84.0 | 4.20 [1, 12] | 36.75 [0.43, 102.03] 
4-d ODE Model 32.0 | 9.00 [1, 12] | 362.14 [29.42, 681.41] 
6-d ODE Model 40.0 | 8.60 [1, 12] | 310.45 [30.65, 562.67] 
8d ODE Model 12.0 |11.00 [2, 12]]495.23 [111.50, 698.93] 


Table 2. Percentage of successful runs, average number of iterations and average 
computational times (in seconds) of the CEGIS procedure, over 50 runs. The square 
brackets contain the minimum and maximum values obtained. 


5 Conclusions and Future Work 


We have presented a new inductive, formal, automated technique to synthesise 
neural-based barrier certificates for polynomial and non-polynomial, continuous 
and hybrid dynamical models. Thanks to a number of architectural choices for 
the new procedure, our method requires less training data and thus displays faster 
learning, as well as quicker verification time, than state-of-the-art techniques. 

Ongoing work is porting presented and related [5,2] theoretical results into 
a software tool [1]. Towards increased automation, future work includes the 
development of an automated selection of activation functions that are tailored 
to the dynamical models of interest. 
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Abstract. We propose a spurious region guided refinement approach for robust- 
ness verification of deep neural networks. Our method starts with applying the 
DeepPoly abstract domain to analyze the network. If the robustness property can- 
not be verified, the result is inconclusive. Due to the over-approximation, the 
computed region in the abstraction may be spurious in the sense that it does not 
contain any true counterexample. Our goal is to identify such spurious regions 
and use them to guide the abstraction refinement. The core idea is to make use of 
the obtained constraints of the abstraction to infer new bounds for the neurons. 
This is achieved by linear programming techniques. With the new bounds, we 
iteratively apply DeepPoly, aiming to eliminate spurious regions. We have im- 
plemented our approach in a prototypical tool DeepSRGR. Experimental results 
show that a large amount of regions can be identified as spurious, and as a result, 
the precision of DeepPoly can be significantly improved. As a side contribution, 
we show that our approach can be applied to verify quantitative robustness prop- 
erties. 


1 Introduction 


In the seminal work [34], deep neural networks (DNN) have been successfully applied 
in Go to play against expert humans. Afterwards, they have achieved exceptional per- 
formance in many other applications such as image, speech and audio recognition, self- 
driving cars, and malware detection. Despite the success of solving these problems, 
DNNs have also been shown to be often lack of robustness, and are vulnerable to ad- 
versarial samples [39]. Even for a well-trained DNN, a small (and even imperceptible) 
perturbation may fool the network. This is arguably one of the major obstacles when 
we deploy DNNs in safety-critical applications like self-driving cars [42], and medical 
systems [33]. 

It is thus important to guarantee the robustness of DNNs for safety-critical appli- 
cations. In this work, we focus on (local) robustness, i.e., given an input and a ma- 
nipulation region around the input (which is usually specified according to a certain 
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norm), we verify that a given DNN never makes any mistake on any input in the region. 
The first work on DNN verification was published in [30], which focuses on DNNs 
with sigmoid activation functions with a partition-refinement approach. In 2017, Katz 
et al. [20] and Ehlers [10] independently implemented Reluplex and Planet, two SMT 
solvers to verify DNNs with the ReLU activation function on properties expressible 
with SMT constraints. Since 2018, abstract interpretation has been one of the most pop- 
ular methods for DNN verification in the lead of AI? [13], and subsequent works like 
[36,37,23,1,35,28,24] have improved AI? in terms of efficiency, precision and more ac- 
tivation functions (like sigmoid and tanh) so that abstract interpretation based approach 
can be applied to DNNs of larger size and more complex structures. 


Among the above methods, DeepPoly [37] is a most outstanding one regarding 
precision and scalability. DeepPoly is an abstract domain specially developed for DNN 
verification. It sufficiently considers the structures and the operators of a DNN, and 
it designs a polytope expression which not only fits for these structures and operators 
to control the loss of precision, but also works with a very small time overhead to 
achieve scalability. However, as an abstraction interpretation based method, it provides 
very little insight if it fails to verify the property. In this work, we propose a method 
to improve DeepPoly by eliminating spurious regions through abstraction refinement. 
A spurious region is a region computed using abstract semantics, conjuncted with the 
negation of the property to be verified. This region is spurious in the sense that if the 
property is satisfied, then this region, although not empty, does not contain any true 
counterexample which can be realized in the original program. In this case, we propose 
a refinement strategy to rule out the spurious region, i.e., to prove that this region does 
not contain any true counterexamples. 


Our approach is based on DeepPoly and improves it by refinement of the spuri- 
ous region through linear programming. The core idea is to intersect the abstraction 
constructed by abstract interpretation with the negation of the property to generate a 
spurious region, and perform linear programming on the constraints of the spurious re- 
gion so that the bounds of the ReLU neurons whose behaviors are uncertain can be 
tightened. As a result, some of these neurons can be determined to be definitely acti- 
vated or deactivated, which significantly improves the precision of the abstraction given 
by abstract interpretation. This procedure can be performed iteratively and the precision 
of the abstraction are gradually improved, so that we are likely to rule out this spurious 
region in some iteration. If we successfully rule out all the possible spurious regions 
through such an iterative refinement, the property is soundly verified. Our method is 
similar in spirit to counterexample guided abstraction refinement (CEGAR) [6], i.e., 
we apply abstract interpretation for abstraction and linear programming for refinement. 
A fundamental difference is that we use the constraints of the spurious region, instead 
of a concrete counterexample (which is challenging to construct in our setting), as the 
guidance of refinement. 


The same spurious region guided refinement approach is also effective in quanti- 
tative robustness verification. Instead of requiring that all inputs in the region should 
be correctly classified, a certain probability of error in the region is allowed. Quantita- 
tive robustness is more realistic and general compared to the ordinary robustness, and a 
DNN verified against quantitative robustness is useful in practice as well. The spurious 
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region guided refinement approach naturally fits for this setting, since a comparatively 
precise over-approximation of the spurious region implies a sound robustness confi- 
dence. To the best of our knowledge, for DNNs, this is the first work to verify quantita- 
tive robustness with strict soundness guarantee, which distinguishes our approach from 
the previous sampling based methods like [45,46,3]. 

In summary, our main contributions are as follows: 


— We propose spurious region guided refinement to verify robustness properties of 
deep neural networks. This approach significantly improves the precision of Deep- 
Poly and it can verify more challenging properties than DeepPoly. 

— We implement the algorithms as a prototype and run them on networks trained on 
popular datasets like MNIST and ACAS Xu. The experimental results show that our 
approach significantly improves the precision of DeepPoly in successfully verifying 
much stronger robustness properties (larger maximum radius) and determining the 
behaviors of a great proportion of uncertain ReLU neurons. 

— We apply our approach to solve quantitative robustness verification problem with 
strict soundness guarantee. In the experiments, we observe that, comparing to using 
only DeepPoly, the bounds by our approach can be up to two orders of magnitudes 
better in the experiments. 


Organisations of the paper. We provide preliminaries in Section 2. DeepPoly is recalled 
in Section 3. We present our overall verification framework and the algorithm in Sec- 
tion 4, and discuss quantitative robustness verification in Section 5. Section 6 evaluates 
our algorithms through experiments. Section 7 reviews related works and concludes the 


paper. 


2 Preliminaries 


In this section we recall some basic notions on deep neural networks, local robustness 
verification, and abstract interpretation. Given a vector x € R™, we write x; to denote 
its i-th entry for 1 < i < m. 


2.1 Robustness verification of deep neural networks 


In this work, we focus on deep feedforward neural networks (DNNs), which can be 
represented as a function f : R™ — R”, mapping an input x € R™ to its output y = 
f(x) € R”. A DNN f often classifies an input x by obtaining the maximum dimension 
of the output, i.e., arg maxi<i<n f(x). We denote such a DNN by Cy : R™ > C 
which is defined by C(x) = arg maxi<i<n f(x); where C = {1,...,n} is the set of 
classification classes. 

A DNN has a sequence of layers, including an input layer at the beginning, followed 
by several hidden layers, and an output layer in the end. The output of a layer is the input 
of the next layer. Each layer contains multiple neurons, the number of which is known 
as the dimension of the layer. The DNN f is the composition of the transformations 
between layers. Typically an affine transformation followed by a non-linear activation 
function is performed. For an affine transformation y = Az + b, if the matrix A is not 
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sparse, we call such a layer fully connected. A DNN with only fully connected layers 
and activation functions is a fully connected neural network (FNN). In this work, we 
focus on the rectified linear unit (ReLU) activation function, defined as ReLU(x) = 
max(zx, 0) for x € R. Typically, a DNN verification problem is defined as follows: 


Definition 1. Given a DNN f : R” — R”, a set of inputs X C R™, and a property 
P CR", we need to determine whether f(X) := { f(x) | x € X} C P holds. 


Local robustness describes the stability of the behaviour of a normal input under a 
perturbation. The range of input under this perturbation is the robustness region. For a 
DNN C p(x) which performs classification tasks, a robustness property typically states 
that Cp outputs the same class on the robustness region. 

There are various ways to define a robustness region, and one of the most popular 
ways is to use the Lp norm. For x € R™ and 1 < p < oo, we define the Lp norm of 
z to be ||ællp = Oy Jail?) , and its Loo norm ||z||,. = maxi<i<m |v;|. We write 
B,(a,r) := {x € R” | ||x—z'||p < r} to represent a (closed) Lp ball for x € R™ and 
r > 0, which is a neighbourhood of x as its robustness region. If we set X = B,(z, r) 
and P = {y € R” | argmax; y; = C;(x)} in Def. 1, it is exactly the robustness 
verification problem. Hereafter, we set p = oo. 


2.2 Abstract interpretation for DNN verification 


Abstract interpretation [7] is a static analysis method and it is aimed to find an over- 
approximation of the semantics of programs and other complex systems so as to verify 
their correctness. Generally we have a function f : R’’ — R” representing the concrete 
program, a set X C R” representing the property that the input of the program satisfies, 
and a set P C R” representing the property to verify. The problem is to determine 
whether f(X) C P holds. However, in many cases it is difficult to calculate f(X) and 
to determine whether f(X) C P holds. Abstract interpretation uses abstract domains 
and abstract transformations to over-approximate sets and functions so that an over- 
approximation of the output can be obtained efficiently. 

Now we have a concrete domain C, which includes X as one of its elements. To 
make computation efficient, we need an abstract domain A to abstract elements in the 
concrete domain. We assume that there is a partial order < on C and A, which in our 
settings is the subset relation C. We also have a concretization function y : A > C 
which assigns an abstract element to its concrete semantics, and 7(a) is the least upper 
bounds of the concrete elements that can be soundly abstracted by a € A. Naturally 
a € Ais a sound abstraction of c € C if and only if c < (a). 

The design of an abstract domain is one of the most important problems in abstract 
interpretation because it determines the efficiency and precision. In practice, we use 
a certain type of constraints to represent the abstract elements in an abstract domain. 
Classical abstract domains for Euclidean spaces include Box, Zonotope [14,15], and 
Polyhedra [38]. 

Not only do we need abstract domains to over-approximate sets, but we are also 
required to adopt over-approximation to functions. Here we consider the lifting of the 
function f : R™ — R” defined as Ty(X) : P(R™) > P(R"), Ty(X) := f(X) = 
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{ f(x) | x € X}. Now we have an abstract domain A+ for the k-dimension Euclidean 
space and the corresponding concretization y. A function T7 : Am — An is a sound 
abstract transformer of Tp, if Ty oy C yo TF : 

When we have a sound abstraction X* € A of X and a sound abstract transformer 
TH , we can use the concretization of T (X#) to over-approximate f(X) since we 
have f(X) = T4(X) © Ty((X*)) C yo TË (X#). If y o T? (X*) C P, the prop- 
erty P is successfully verified. Obviously, verification through abstract interpretation is 
sound but not complete. Hereafter, we write f” to represent Tř for simplicity. 


AT? [13] first adopted abstract interpretation to verify DNNs, and many subsequent 
works like [36,37,23] focused on improving its efficiency and precision through, e.g., 
defining new abstract domains. As a deep neural network, the function f : R™ — R” 
can be regarded as a composition f = f,o---o fı of its + 1 layers, where fj performs 
the transformation between the j-th and the (j + 1)-th layer, i.e., it can be an affine 
transformation, or a ReLU operation. If we choose Box, Zonotope, or Polyhedra as the 
abstract domain, then for linear transformations and the ReLU functions, their abstract 
transformers have been developed in [13]. After we have abstract transformers a for 


these fj, we can conduct abstract interpretation layer by layer as fF o.o JË (X#). 


3 A Brief Introduction to DeepPoly 


Our approach relies on the abstract domain DeepPoly [37], which is the state-of-the-art 
abstract domain for DNN verification. It defines the abstract transformers of multiple 
activation functions and layers used in DNNs. The core idea of DeepPoly is to give 
every variable an upper and a lower bound in the form of an affine expression using 
only variables that appear before it. It can express a polyhedron globally. Moreover, 
experimentally, it often has better precision than Box and Zonotope domains. 

We denote the n-dimensional DeepPoly abstract domain with A,,. Formally an ab- 
stract element a € An is a tuple (aS, a7,1,u), where a= and a= give the i-th variable 
x; a lower bound and an upper bound, respectively, in the form of a linear combina- 
tion of variables which appear before it, i.e. Sya Wkk + Wo, for i = 1,..., n, and 
l,u € R” give the lower bound and upper bound of each variable, respectively. The 
concretization of a is defined as 


y(a) = {x ER" | aS < ri <a, i=1,...,n}. (1) 


The abstract domain A,, also requests that its abstract elements a should satisfy the 
invariant (a) C |l, u]. This invariant helps construct efficient abstract transformers. 
For an affine transformation x7; = ai WkEk + Wo, We set 


By substituting the variables x; appearing in as with az or az according to its coef- 
ficient at most 7 — 1 times, we can obtain a sound lower bound in the form of linear 
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Fig. 1. Framework of spurious region guided refinement 


combination on input variables only, and l; can be computed immediately from the 
range of input variables. A similar procedure also works for computing ui. 
For a ReLU transformation x; = ReLU(a;), we consider two cases: 


- Ifl; > 0 or uj < 0, this ReLU neuron is definitely activated or deactivated, 
respectively. In this case, this ReLU transformation actually performs an affine 
transformation, and thus its abstract transformer can be defined as above. 

- Ifl; < 0 and uj > 0, the behavior of this ReLU neuron is uncertain, and we 
need to over-approximate this relation with a linear upper/lower bound. The best 


upper bound is az = vieih), For the lower bound, there are multiple choices 
I I 


as = Ax; where À € [0, 1]. We choose A € {0, 1} which minimizes the area of the 
constraints. Basically we have two abstraction modes here, corresponding to the 
two choices of À. 


Note that for a DNN with only ReLU as non-linear operators, over-approximation oc- 
curs only when there are uncertain ReLU neurons, which are over-approximated using 
a triangle. The key of improving the precision is thus to compute the bounds of the 
uncertain ReLU neurons as precisely as possible, and to determine the behaviors of the 
most uncertain ReLU neurons. 

DeepPoly also supports activation functions which are monotonically increasing, 
convex on (—oo, 0] and concave on [0, +00), like sigmoid and tanh, and it supports 
max pooling layers. Readers can refer to [37] for details. 


4 Spurious Region Guided Refinement 


We explain the main steps of our algorithm, as depicted in Fig. 1. For the input property 
and network, we first employ DeepPoly as the initial step to compute f#(X*). The 
concretization of f#(X*) is the conjunction of many linear inequities given in Eq. 1, 
and for the robustness property P, the negation —P is the disjunction of several linear 


inequities ~P = Vigcy (a) Yes ia) =e < 0). 
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1. We check whether f#(X*) N* (yo;(z) — Ye < 0) = + holds for each t, which 
follows the same method as DeepPoly, i.e., we compute the lower bound of yc, (s) — 
yz and see whether it is larger than 0. In case of yes, it indicates that the label t 
cannot be classified, as it is dominated by C(x). Otherwise, we have f# (X#*)N# 
=P # L, we have the conjunction y(f#(X*))A—P as a potential spurious region, 
which represents the intersection of the abstraction of the real semantics and the 
negation of the property to verify. We call such a region spurious because if the 
property is satisfied, then this region does not contain a true counterexample, i.e., a 
pair of input and output (a*, y*) such that y* = f(a*) and y* violates the property 
P. In this case, this region is spuriously constructed due to the abstraction of the 
real semantics, where the counterexamples cannot be realized, and thus we aim to 
rule out the spurious region. 

2. If no potential spurious region is found, our algorithm safely returns yes. 

3. Assume now that we have a the potential spurious region. The core idea is to use 
the constraints of the spurious region to refine this spurious region. Here a natural 
way to refine the spurious region is linear programming, since all the constraints 
here are linear inequities. If the linear programming is infeasible, it indicates that 
the region is spurious, and thus we can return an affirmative result. Otherwise, our 
refinement will tighten the bounds of variables involved in the DNN, especially 
the input variables and uncertain ReLU neurons, and these tightened bounds help 
further give a more precise abstraction. 

4. As our approach is based on DeepPoly, similarly, we cannot guarantee complete- 
ness. We set a threshold N of the number of iterations as a simple termination 
condition. If the termination condition is not reached, we run DeepPoly again, and 
return to the first step. 


Below we give an example, illustrating how refinement can help in robustness veri- 
fication. 


11 
gion B,.((0,0)", 1). The robustness property P here is y2 — yı > 0. We invoke first 
DeepPoly: the lower bound of y2 — yı given by DeepPoly is —0.5. As a result, the 
robustness property cannot be verified directly. Fig. 2(a) shows details of the example. 


Example 1. Consider the network f(x) = ReLU ((; re £+ e and the re- 


We fail to verify the property in Example 1 because for the uncertain ReLU relation 
yı = ReLU(z3), the abstraction is imprecise, and the key to making the abstraction 
more precise here is to obtain as tight a bound as possible for x3. 


Example 2. We use the constraints in Fig. 2(a) and additionally the constraint y2— yı < 
0 (i.e., ~P) as the input of linear programming. Our aim is to obtain a tighter bound of 
the input neurons x; and 22, as well as the uncertain ReLU neuron 3, so the objective 
functions of the linear programming are min x; and min —2z; for 2 = 1, 2,3. All the 
three neurons have a tighter bound after the linear programming (see the red part in 
Fig. 2(b)). Fig. 2(b) shows the running of DeepPoly under these new bounds, where the 
input range and the abstraction of the uncertain ReLU neuron are both refined. Now the 
lower bound of yə — yı is 0.25, so DeepPoly successfully verifies the property. 
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ti ž=1 g3 > £1 — T2 yı 20 tı 2—1 g3 > %1 — T2 Yı È T3 
<1 x3 < xı — z2 yt < 0.50341 am <0 w<ai—22 yi < 0.75x3 + 0.25 
i, =-1 l3 = — ls =0 l=- l3 = —0.333 ls = 

uwu=l u3 = 2 Us = 2 w= =l us=1 


ReLU (z3) E 
ReLU (z4) E 


z2 > —1 £4 > £1 + £2 + 2.5 yo > T4 z2 > —1 £4 > £1 + T2 + 2.5 Y2 > Xa 
vasil £4 < £1 + T2 + 2.5 yo < T4 xə < —0.667 z4 < £1 + £2 + 2.5 Y2 < T4 
lə =-—1 la = 0.5 le = 0.5 lə =-—1 la = 0.5 le =). 
u2 = 1 u4 = 4.5 ue = 4.5 u2 = —0.667 u4 = 1.833 ug = 1.833 


(a) (b) 


Fig. 2. Example 1 (left) and Example 2 (right): where the red parts are introduced through linear 
programming based refinement and the blue parts are introduced by a second run of DeepPoly. 


4.1 Main algorithm 


Alg. 1 presents our algorithm. First we run abstract interpretation to find the uncertain 
neurons and the spurious regions (Line 2-5). For each possible spurious region, we have 
a while loop which iteratively refines the abstraction. In each iteration we perform linear 
programming to renew the bounds of the input neurons and uncertain ReLU neurons; 
when we find that the bound of an uncertain ReLU neuron becomes definitely non- 
negative or non-positive, then the ReLU behavior of this neuron is renewed (Line 14— 
20). We use them to guide abstract interpretation in the next step (Line 21-22). Here in 
Line 22, we make sure that during the abstract interpretation, the abstraction of previous 
uncertain neurons (namely the uncertain neurons before the linear programming step in 
the same iteration) compulsorily follows the new bounds and new ReLU behaviors 
given by the current C’y9, C'<o, l, and u, where these bounds will not be renewed by 
abstract interpretation, and the concretization of Y is defined as 


WY) = {x | Vi. YS < zi < ¥7}0 [lu]. (2) 


The while loop ends when (i) either we find that the spurious region is infeasible 
(Line 11, 24) and we proceed to refine the next spurious region, with a label Verified 
True, (ii) or we reach the terminating condition and fail to rule out this spurious region, 
in which case we return UNKNOWN. If every while loop ends with the label Verified 
True, we successfully rule out all the spurious regions and return YES. An observation 
is that, if some spurious regions have been ruled out, we can add the constraints of their 
negation to make the current spurious region smaller so as to improve the precision 
(Line 9). 

Here we discuss the soundness of Alg. 1. We focus on the while loop and claim that 
it has the following loop invariant: 


Invariant 1 The abstract element Y over-approximates the intersection of the seman- 
tics of f on B(x,r) and the spurious region, i.e., f(Boo(«,r)) O Spu C ¥(Y). 
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Algorithm 1 Spurious region guided robustness verification 


Input: 
DNN f, input v, radius r. 
Output: 
Return “YES” if verified, or “UNKNOWN” otherwise. 
1: function VERIFY(f, £, r) 
2 Yo + f#(Bæ(z,r)) > abstract interpretation with DeepPoly 
3 Va < {v | v was marked as uncertain in Line 2} 
4: A= {t| Yon (yo;(e) — ye < 0) ZL} 
5: if A = Ú then return YES > otherwise A = {t 
6 for i < 1 tol do 
7 Verified + False, V + Va, Y + Yo > denote Y = 
8: Co {= 0, Ce<o y= f) . > set of new activated/de 
9: Spu = (Yo s(x) — Yt; < 0) ^ Nai (Yop (x) — Ut; 2 0) 
10: while terminating condition not satisfied do 
Ti; if Y ^A Spu is infeasible then 
12: Verified «+ True 
13: break 
14: for v € V U Vo do > Vo: set of input neurons 
15: (ly, Uv) < LP(Y A Spu, v) 
16: for v € V do 
17: if l, > 0 then 
18: Coo + Coo U {vu}, V V \ {v} 
19: else if u, < 0 then 
20: C'<o — C<o U {v}, VeVvV N {v} 
21: X Heyy lly <v < w} 
22: Y + f*(X) according to C>0, C<o, l, and u 
23: V <+ {v | v was marked as uncertain in Line 22} \ (Co U C<o) 
24: if Y O” (yo;(e) — yt; < 0) = L then 
23: Verified True 
26: break 
21: if Verified = False then return UNKNOWN 


28: return YES 


The initialization of Y is f#(B.(x,r)) and it is naturally an over-approximation. 
The box X is obtained by linear programming on Y ^ Spu, and f#(X) is calcu- 
lated through abstract interpretation and the bounds given by linear programming on 
Y A Spu, and thus it remains an over-approximation. It is worth mentioning that, when 
we run DeepPoly in Line 22, we are using the bounds obtained by linear programming 
to guide DeepPoly, and this may violate the invariant y(a) C [l, u] mentioned in Sect. 3. 
Nonotheless, soundness still holds since the concretization of Y is newly defined in 
Eq. 2, where both items in the intersection over-approximate f(B(x,r))  Spu. With 
Invarient 1, Alg. 1 returns YES if for any possible spurious region Spu, the over- 
approximation of f(B.(x,r)) N Spu is infeasible, which implies the soundness of 
Alg. 1. 
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4.2 Iterative refinement of the spurious region 


Here we present more theoretical insight on the iterative refinement of the spurious 
region. An iteration of the while loop in Alg. 1 can be represented as a function £ : A > 
A, where A is the DeepPoly domain. An interesting observation is that, the abstract 
transformer f* in the DeepPoly domain is not necessarily increasing, because different 
input ranges, even if they have inclusion relation, may lead to different choices of the 
abstraction mode of some uncertain ReLU neurons, which may violate the inclusion 
relation of abstraction. We have found such examples during our experiment, which is 
illustrated in the following example. 


Example 3. Let f(x) = ReLU(«) with input ranges Jı = [—2,1] and Jz = [—2,3]. 
We have f# (1) = {(1, 22)" € R? | —2 < zı <1, £2 > 0, x2 < $01 + 3} and 
f# (Io) = {(21, 22) € R? | —2 < z1 <3, T2 > £1, T2 < 3x + Sy. We observe 
(1,0)? € f#(I) but (1,0)? ¢ f#(I2), which implies that the transformer f* is not 
increasing. 


This fact also implies that £ is not necessarily increasing, which violates the condition 
of Kleene’s Theorem on fixed point [4]. 

Now we turn to the analysis of the sequence {Yp = L*( f# (Bo (x,1r)))}22,, where 
Li := Land L* := LoL! for k > 2. First we have the following lemma showing 
that in our settings every decreasing chain S in the DeepPoly domain A has a meet 


ised 


Lemma 1. Let A,, be the n-dimensional DeepPoly domain and {a} C An a de- 
(k),< 


creasing bounded sequence of non-empty abstract elements. If the coefficients in a; 


(k),> 


and a; ^ are uniformly bounded, then there exists an abstract element a* € Ap, S.t. 
Va") = kar (a). 
(k), < 


Remark: The condition that the coefficients in a; 7 and a™ = are uniformly bounded 
are naturally satisfied in our setting, since in a DNN the coefficients and bounds in- 
volved have only finitely many values. Readers can refer to [50] for a formal proof. 

Lemma | implies that if our sequence {Yọ } is decreasing, then the iterative refine- 
ment converges to an abstract element in DeepPoly, which is the greatest fixed point of 
L that is smaller than f*(B(2,1r)). A sufficient condition for {Y;,} being decreasing 
is that during the abstract interpretation in every Y, every initial uncertain neuron main- 
tains its abstraction mode, i.e. its corresponding À does not change, before its ReLU 
behavior is determined. A weaker sufficient condition for convergence is that change in 
abstraction mode of uncertain neurons never happens after finitely many iterations. 

If the abstraction mode of uncertain neurons changes infinitely often, generally the 
sequence {Y;,} does not converge. In this case, we can consider its subsequence in 
which every Y, is obtained with the same abstraction mode. It is easy to see that such 
a subsequence must be decreasing and thus have a meet, as it is an accumulative point 
of the sequence {Y;,}. Since there are only finitely many choices of abstraction modes, 
such a accumulative points exists in {Yx}, and there are only finitely many accumu- 
lative points. We conclude these results in the following theorem which describes the 
convergence behavior of our iterative refinement of the spurious region: 
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Theorem 2. There exists a subsequence {Y,,,} of {Yp} s-t. {Yn} is decreasing and 
thus has a meet Yn }. Moreover, the set 


{N #1Yn,} | {Yn, } is a decreasing subsequence of {Yet} 


is finite, and it is a singleton if exact one abstraction mode of uncertain ReLU neurons 
happens infinitely often. 


Proof. Since the abstraction modes of uncertain ReLU neurons have only finitely many 
choices, there must be one which happens infinitely often in the computation of the 
sequence { Y; }, and we choose the subsequence {Yn } in which every item is computed 
through this abstraction mode. Obviously {Yn } is decreasing and thus has a meet. 

For a decreasing subsequence {Y,,, }, we can find its subsequnce in which the ab- 
straction mode of uncertain ReLU neurons does not change, and they have the same 
meet. Since there are only finitely many choices of abstraction modes of uncertain 
ReLU neurons, such accumulative points of {Y;} also have finitely many values. If 
exact one abstraction mode of uncertain ReLU neurons happens infinitely often, obvi- 
ously there is only one accumulative point in {Y;}. 


4.3 Optimizations 


In the implementation of our main algorithm, we propose the following optimizations 
to improve the precision of refinement. 


Optimization 1: More precise constraints in linear programming. In Line 15 of Alg. 1, 
it is not the best choice to take the linear constraints in the abstract element Y into linear 
programming, because the abstraction of uncertain ReLU neurons in DeepPoly is not 
the best. Planet [10] has a component which gives a more precise linear approximation 
for uncertain ReLU relations, where it uses the linear constraints y < wed y= 
x, y > 0 to over-approximate the relation y = ReLU (x) with x € [l, u]. 


Optimization 2: Priority to work on small spurious regions. In Line 6 of Alg. 1,we 
determine the order of refining the spurious regions based on their sizes, i.e., a smaller 
region is chosen earlier. This is based on the intuition that Alg. 1 works effectively if the 
spurious region is small. After the small spurious regions are ruled out, the constraints of 
large spurious regions can be tightened with the conjunction Ni lUc ¢(a) — Yt; Z 0). 
It is difficult to strictly determine which spurious region is the smallest, and thus we 
refer to the lower bound of yc; (x) — Yt; given by DeepPoly, i.e., the larger this lower 
bound is, the smaller the spurious region is likely to be, and we perform the for loop in 
Line 6 of Alg. 1 in this order. 


5 Quantitative Robustness Verification 


In this section we recall the notion of quantitative robustness and show how to verify a 
quantitative robustness property of a DNN with spurious region guided refinement. 
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In practice, we may not need a strict condition of robustness to ensure that an input x 
is not an adversarial example. A notion of mutation testing is proposed in [44,43], which 
requires that an input x is normal if it has a low label change rate on its neighbourhood. 
They follow a statistical way to estimate the label change rate of an input, which moti- 
vates us to give a formal definition of the property showing a low label change rate, and 
to consider the verification problem for such a property. Below we recall the definition 
of quantitative robustness [27], where we have a parameter 0 < 7) < 1 representing the 
confidence of robustness. 


Definition 2. Given a DNN C's : R” > C, an input x € R”, r >0,0< 7 <1, and 
a probability measure p on B(x,r), f is n-robust at x, if 


p({2! € Boo(w,r) | C(x’) = Cp(x)}) > n. 


Def. 2 has a tight association with label change rate, i.e., if x is 7)-robust, then the label 
change rate should be smaller than, or close to 1 — 7. Hereafter, we set u to be the 
uniform distribution on B,, (2,1). 

It is natural to adapt spurious region guided refinement to quantitative robustness 
verification. In Alg. 1, we do not return UNKNOWN when we cannot rule out a spurious 
region, but record the volume of the box X as an over-approximation of the Lebesgue 
measure of the spurious region. After we work on all the spurious regions, we calculate 
the sum of these volume, and obtain a sound robustness confidence. Here we do not 
calculate the volume of the spurious region because precise calculation of volume of 
a high-dimensional polytope remains open, and we do not choose to use randomized 
algorithms because it may not be sound. 

We further improve the algorithm through the powerset technique [13]. Powerset 
technique is a classical and effective way to enhance the precision of abstract interpre- 
tation. We split the input region into several subsets, and run abstract interpretation on 
these subsets, In our quantitative robustness verification setting, powerset technique not 
only improves the precision, but also accelerates the algorithm in some situations: If the 
subsets have the same volume, and the percentage of the subsets on which we may fail 
to verify robustness is already smaller than 1 — 7, then we have successfully verified 
the 7-robustness property. 


6 Experimental Evaluation 


We implement our approach as a prototype called DeepSRGR. The implementation 
is based on a re-implementation of the ReLU and the affine abstract transformers of 
DeepPoly in Python 3.7 and we amend it accordingly to implement Alg. 1. We use 
CVXPY [8] as our modeling language for convex optimization problems and CBC [18] 
as the LP solver. It is worth mentioning that we ignore the floating point error in our 
re-implementation of DeepPoly because sound linear programming currently does not 
scale in our experiments. In the terminating condition, we set N = 5. The two op- 
timizations in Sect. 4.3 are adopted in all the experiments. All the experiments are 
conducted on a CentOS 7.7 server with 16 Intel Xeon Platinum 8153 @2.00GHz (16 
cores) and 512G RAM, and they use 96 sub-processes concurrently at most. Readers 
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can find all the source code and other experimental materials in https://iscasmc.ios.ac. 
cn/ToolDownload/?Tool=DeepSRGR. 


Datasets. We use MNIST [22] and ACAS Xu _ [12,17] as the datasets in our experi- 
ments. MNIST contains 60 000 grayscale handwritten digits of the size 28 x 28. We can 
train DNNs to classify the images by the written digits on them. The ACAS Xu system 
is aimed to avoid airborne collisions for unmanned aircrafts and it uses an observation 
table to make decisions for the aircraft. In [19], the observation table is realized by 
training DNNs instead of storing it. 


Networks. On MNIST, we trained seven fully connected networks of the size 6 x 20, 
3 x 50, 3 x 100, 6 x 100, 6 x 200, 9 x 200, and 6 x 500, where m x n refers m 
hidden layers and n neurons in each hidden layer, and we name them from FNN2 to 
FNN8, respectively (we also have a small network FNN1 for testing). On ACAS Xu, 
we randomly choose three networks used in [20], all of the size 6 x 50. 


6.1 Improvement in precision 


First we compare DeepPoly and DeepSRGR in terms of their precision of robustness 
verification. We consider the following two indices: (i) the maximum radius that the two 
tools can verify, and (ii) the number of uncertain ReLU neurons whose behaviors can be 
further determined by DeepSRGR. For each network, we randomly choose three images 
from the MNIST dataset, and calculate their maximum radius that the two tools can ver- 
ify through a binary search on the seven FNNs. In column “# uncertin ReLU” we record 
the number of the uncertain ReLU neurons when first applying DeepPoly, and also 
count how many of them are renewed, namely become definitely activated/deactivated 
in later iterations when applying DeepSRGR. 

Table 1 shows the results. We can see from Table | that DeepSRGR can verify 
much stronger (i.e., larger maximum radius) robustness properties than DeepPoly. The 
average number of iterations for ruling out a spurious region is 2.875, and about half 
of the spurious regions can be ruled out within 2 iterations. DeepSRGR sometimes 
determines behaviors of a large proportion of uncertain ReLU neurons on large net- 
works: Considering the last picture of the most challenging network FNN8, more than 
ninety percent (92.6% ~ 1269) of the uncertain neurons are renewed. Improvement 
in precision evaluated in this experiment works for verification of both robustness and 
quantitative robustness, and this is why our method is effective in both tasks. 


6.2 Robustness verification performance 


In this setting, we randomly choose 50 samples from the MNIST dataset. We fix four 
radii, 0.037, 0.026, 0.021, and 0.015 for the four networks FNN4 — FNN7 respectively, 
and verify the robustness property with the corresponding radius on the 50 inputs. The 
radius chosen here is very challenging for the corresponding network. 

Table 2 presents the results. As we can see, DeepSRGR can verify significantly more 
properties than DeepPoly. Linear programming in DeepSRGR takes a large amount of 
time in the experiment, and thus DeepSRGR is less efficient (a DeepPoly run takes no 
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Maximum radius |# spurious|# uncertain ReLU % renewed # iterations 
DeepPoly DeepSRGR]| regions Original Renewed] MAX AVG |MAX GT 
0.034 0.047 6 51 38 74.5% 48.4% 3 17 
FNN2| 0.017 0.023 3 47 37 78.7% 51.8% 4 9 
0.017 0.023 1 34 25 73.5% 73.5% 4 4 
0.049 0.066 6 88 69 78.4% 60.9% 5 15 
FNN3) 0.025 0.033 7 94 85 90.4% 46.0% 5 18 
0.045 0.058 3 98 45 45.1% 27.2% 5 9 
0.045 0.060 6 180 102 56.7% 35.2% 5 19 
FNN4| 0.024 0.030 6 199 144 72.4% 36.5% 4 15 
0.035 0.046 2 155 103 66.5% 42.9% 3 7 
0.034 0.042 7 305 245 80.3% 37.8% 5 20 
FNNS| 0.016 0.019 5 315 204 64.8% 34.0% 4 14 
0.021 0.027 7 337 256 76.0% 34.9% 5 18 
0.022 0.026 T 683 271 39.7% 19.8% 4 18 
FNN6) 0.011 0.013 6 657 483 73.5% 36.7% 3 14 
0.021 0.025 8 723 169 23.4% 12.2% 5 21 
0.021 0.023 9 987 297 30.1% 10.0% 5 29 
FNN7| 0.010 0.011 5 877 648 73.9% 26.8% 3 11 
0.017 0.019 7 913 352 38.6% 24.3% 3 16 
0.037 0.044 9 1504 976 64.9% 45.9% 5 36 
FNN8| 0.020 0.022 9 1213 818 67.4% 33.3% 3 21 
0.033 0.040 9 1371 1269 92.6% 51.1% 5 37 


Table 1. Maximum radius which can be verified by DeepPoly and DeepSRGR, and details of 
DeepSRGR running on its maximum radius, where in the number of renewed uncertain nuerons, 
we show the largest one among the spurious regions. MAX, AVG, and GT means the maximum, 
the average, and the grant total among the spurious regions, respectively. The indices of the three 
images are 414, 481, and 65 in the MNIST dataset. 


more than 100 seconds on FNN7). Furthermore, we again run the 15 running examples 
which are not verified by DeepSRGR on FNN4, by resetting the maximum number of 
iterations to 20 and 50. We have the following observations: 


— Two more properties (out of 15) are successfully verified when we change N to 20. 
No more properties can be verified even if we change N from 20 to 50. 

— In this experiment, 13 more spurious regions are ruled out, six of which takes 6 
iterations, one takes 7, two takes 8, and the other four takes 13, 22, 27, and 32 
iterations, respectively. In these running examples, the average number of renewed 
ReLU behaviors is 102.8, and a large proportion are renewed in the last iteration 
(47.4% on average). Fig. 3 shows the detailed results. 

— As for the 13 spurious regions which cannot be ruled out within 50 iterations, the 
average number of renewed ReLU behaviors is only 8.54, which is significantly 
lower than the average of the 13 spurious regions which are newly ruled out. In 
these running examples, changes in ReLU behaviors and ReLU abstraction modes 
do not happen after the 9th iteration, and the average number is 4.4. 
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. : # verified Time (s) 
Monel Size Radius DeepPoly DeepSRGR MAX AVG 
FNN4 3x 100 0.037 14 35 3384 781 
FNNS 6x100 0.026 19 31 7508 1689 
FNN6 6x200 0.021 14 25 23157 6178 
FNN7 9x200 0.015 25 36 61760 8960 


Table 2. The number that DeepPoly and DeepSRGR verifies among the 50 inputs, and the maxi- 
mum/average running time of DeepSRGR. 


# Renewed ReLU I in the last iteration 
in the whole loop 


Running Examples 


Fig. 3. Number of renewed ReLU behaviors in the spurious regions newly ruled out. 


We observe that, by increasing the termination threshold N from 5 to 50, only two 
more properties out of 15 can be verified additionally. This suggests that our method 
can effectively identify these spurious regions which are relevant to verification of the 
property, in a small number of iterations. 


6.3 Quantitative robustness verification on ACAS Xu networks 


We evaluate DeepSRGR for quantitative robustness verification on ACAS Xu networks. 
We randomly choose five inputs, and compute the maximum robustness radius for each 
input on the three networks with DeepPoly through a binary search. In our experiment, 
the radius for a running example is the maximum robustness radius plus 0.02, 0.03, 
0.04, 0.05, and 0.06. We use the powerset technique and the number of splits is 32. For 
DeepPoly, the robustness confidence it gives is the proportion of the splits on which 
DeepPoly verifies the property. 

Fig. 4 shows the results. We can see that DeepSRGR gives significantly better over- 
approximation of 1— n than DeepPoly. That is, in more than 90% running examples, our 
over-approximation is no more than one half of that given by DeepPoly, and in more 
than 75% of the cases, our over-approximation is even smaller than one tenth of that 
given by DeepPoly. 


404 P. Yang et al. 


DeepSRGR 
{ VERIFICATION RADII Robustness Confidence 
70 5 0.02 © 003 @004 over approximation of 1-n (%) 
© 0.05 @ 006 e 
60 = e 
e 
e 
50 = e 
40 4 
e 
30 4 e 
e 
20 4 ° 
e e 
10 4 e 
e 
Py e 
0 T T hiat ) + of i a a a a 
0 10 20 30 40 50 60 70 80 90 100 


DeepPoly 


Fig. 4. Quantitative robustness verification using DeepPoly and DeepSRGR 


7 Related Works and Conclusion 


We have already discussed papers mostly related to our paper. Here we add some more 
new results. Marabou [21] has been developed as the next generation of Reluplex. Re- 
cently, verification approach based on abstraction of DNN models has been proposed 
in [11,2]. In addition, alternative approaches based on constraint-solving [26,29,5,25], 
layer-by-layer exhaustive search [16], global optimization [31,9,32], functional approx- 
imation [47], reduction to two-player games [48,49], and star set abstraction [41,40] 
have been proposed as well. 

In this work, we propose a spurious region guided refinement approach for robust- 
ness and quantitative robustness verification of deep neural networks, where abstract 
interpretation calculates an abstraction, and linear programming performs refinement 
with the guidance of the spurious region. Our experimental results show that our tool 
can significantly improve the precision of DeepPoly, verify more robustness properties, 
and often provide a quantitative robustness with strict soundness guarantee. 

Abstraction interpretation based framework is quite extensive to different DNN 
models, different properties, and incorporate different verification methods. As future 
work, we will investigate how to increase the precision further by using more precise 
linear over-approximation like [35]. 
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Abstract. To ensure a high availability, communication networks pro- 
vide resilient routing mechanisms that quickly change routes upon fail- 
ures. However, a fundamental algorithmic question underlying such mech- 
anisms is hardly understood: how to verify whether a given network 
reroutes flows along feasible paths, without violating capacity constraints, 
for up to k link failures? We chart the algorithmic complexity landscape 
of resilient routing under link failures, considering shortest path routing 
based on link weights as e.g. deployed in the ECMP protocol. We study 
two models: a pessimistic model where flows interfere in a worst-case 
manner along equal-cost shortest paths, and an optimistic model where 
flows are routed in a best-case manner, and we present a complete picture 
of the algorithmic complexities.We further propose a strategic search al- 
gorithm that checks only the critical failure scenarios while still providing 
correctness guarantees. Our experimental evaluation on a benchmark of 
Internet and datacenter topologies confirms an improved performance of 
our strategic search by several orders of magnitude. 


1 Introduction 


Routing and traffic engineering are most fundamental tasks in a communica- 
tion network. Internet Service Providers (ISPs) today use several sophisticated 
strategies to efficiently provision their backbone network to serve intra-domain 
traffic. This is challenging as in addition to simply providing reachability, rout- 
ing protocols should also account for capacity constraints: to meet quality-of- 
service guarantees, congestion must be avoided. Intra-domain routing protocols 
are usually based on shortest paths, and in particular the Equal-Cost-MultiPath 
(ECMP) protocol [24]. Flows are split at nodes where several outgoing links are 
on shortest paths to the destination, based on per-flow static hashing [7,30]. In 
addition to default routing, most modern communication networks also provide 
support for resilient routing: upon the detection of a link failure, the network 
nodes quickly and collaboratively recompute the new shortest paths [21]. 
However, today, we still do not have a good understanding of the algorithmic 
complexity of shortest path routing subject to capacity constraints, especially 
under failures. In particular, in this paper we are interested in the basic question: 
“Given a capacitated network based on shortest path routing (defined by link 
weights), can the network tolerate up to k link failures without violating capacity 
constraints?” Surprisingly only little is known about the complexity aspects. 
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: a = 
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Fig. 1: Classification of possible network situations 


Pessimistic | Optimistic Pessimistic Optimistic 


Splittable||NL-complete| P-complete | |Splittable||co-NP-complete|co-NP-complete 
Nonsplit. | NL-complete|NP-complete| [Nonsplit. |[co-NP-complete| MHZ -complete 


(a) Without link failures (k = 0) (b) With link failures (k > 0) 


Fig. 2: Summary of complexity results for capacity problems 


Our Contributions. We provide a complete characterization of the algorithmic 
complexity landscape of resilient routing and introduce two basic models of how 
traffic is distributed across the multiple shortest paths. A pessimistic (P) one 
where flows add up in a worst-case manner; if a network is resilient in the pes- 
simistic model, it is guaranteed that routing succeeds along any shortest path 
without overloading links. In the optimistic (O) model flows add up in a best- 
case manner; if a network is resilient in the optimistic model, it may be that the 
specific routing does not overload the links. The two models hence cover the two 
extremes in the spectrum and alternative routing schemes, e.g., (pseudo)random 
routing hence lies in between. Figure 1 illustrates the situations that can arise 
in a network: depending on the scenario, pessimistic (P) or optimistic (O), and 
whether the routing feasibility test is positive or negative, we can distinguish 
between three regimes. (1) If routing is feasible even in the pessimistic case, 
then flows can be safely forwarded by any routing policy without violating any 
capacity constraints. (2) If the pessimistic test is negative but positive in the 
optimistic case, then further considerations are required to ensure that flows use 
the feasible paths (e.g., a clever routing algorithm to find the suitable paths is 
needed). (3) If even the optimistic test is negative then no feasible routing solu- 
tion exists; to be able to successfully route flows in this case, we need to change 
the network characteristics, e.g., to increase the link capacities. 

We further distinguish between splittable (S) and nonsplittable (N) 
flows, and refer to the four possible problems by PS, PN, ON, and OS. Our 
main complexity results are summarized in Figure 2. We can see that without 
link failures (Figure 2a), the problems are solvable in polynomial time, except 
for the ON problem that becomes NP-complete. Moreover, the pessimistic vari- 
ants of the problem can be solved even in nondeterministic logarithmic space, 
implying that they allow for efficient parallelization [33]. On the other hand, the 
optimistic splittable problem is hard for the class P. For the problems with link 
failures (Figure 2b) the complexity increases and the problems become co-NP- 
complete, apart from the ON problem that becomes more difficult to solve and 
is complete for the second level of the polynomial hierarchy [33]. 
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The high computational complexity of the instances with link failures may 
indicate that a brute-force search algorithm exploring all failure scenarios is 
needed to verify whether routing is feasible. However, we present a more effi- 
cient solution, by defining a partial ordering on the possible failure scenarios 
with the property that for the pessimistic model, we only need to explore the 
minimum failure scenarios, and for the optimistic model, it is sufficient to explore 
the maximum failure scenarios. We present an efficient strategic search algorithm 
implementing these ideas, formally prove its correctness, and demonstrate the 
practical applicability of strategic search on a benchmark of Internet and data- 
center topologies. In particular, we find that our algorithm achieves up to several 
orders of magnitude runtime savings compared to the brute-force search. 


Related Work. Efficient traffic routing has received much attention in the liter- 
ature, and there also exist empirical studies on the efficiency of ECMP deploy- 
ments, e.g., in Internet Service Provider Networks [17] or in datacenters [22]. A 
systematic algorithmic study of routing with ECMP is conducted by Chiesa et al. 
in [10]. The authors show that in the splittable-flow model [16], even approximat- 
ing the optimal link-weight configuration for ECMP within any constant factor 
is computationally intractable. Before their work, it was only known that mini- 
mizing congestion is NP-hard (even to just provide “satisfactory” quality [2] and 
also under path cardinality constraints [5]) and cannot be approximated within 
a factor of 3/2 [19]. For specific topologies the authors further show that traf- 
fic engineering with ECMP remains suboptimal and computationally hard for 
hypercube networks. We significantly extend these insights into the algorithmic 
complexity of traffic engineering and introduce the concept of pessimistic and 
optimistic variants of routing feasibility and provide a complete characterization 
of the complexity of routing subject to capacity constraints, also in scenarios 
with failures. Accounting for failures is an important aspect in practice [13,31] 
but has not been studied rigorously in the literature before; to the best of our 
knowledge, so far there only exist heuristic solutions [18] with some notable ex- 
ceptions such as Lancet [8] (which however does not account for congestion). We 
propose to distinguish between optimistic and pessimistic flow splitting; existing 
literature typically revolves around the optimistic scenario. 

We note that while we focus on IP networks (and in particular shortest path 
routing and ECMP), there exist many interesting results on the verification and 
reachability testing in other types of networks and protocols, including BGP [4, 
15], MPLS [25,38], OpenFlow [1] networks, or stateful networks [29, 32, 41]. 
While most existing literature focuses on verifying logical properties, such as 
reachability without considering capacity constraints, there also exist first works 
dealing with quantitative properties [20, 26, 29]. 


2 Network with Capacities and Demands 


We shall now define the model of network with link capacities and flow demands 
and formally specify the four variants of the resilient routing problem. Let N be 
the set of natural numbers and N° the set of nonnegative integers. 
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Definition 1 (Network with Capacities and Demands). A Network with 
Capacities and Demands (NCD) is a triple N = (V,C,D) where V is a finite 
set of nodes, C: V x V =œ N° is the capacity function for each network edge 
(capacity 0 implies the absence of a network link), and D : V x V ++ N° is the 
end-to-end flow demand between every pair of nodes such that D(v,v) = 0 for 
allu E€ V (demand 0 means that there is no flow). 


Let N = (V,C,D) be an NCD. A path from v; to vn where v1, Un € V is any 
nonempty sequence of nodes v1v2+++Un € V* such that C (v;, vi+1) > 0 for all i, 
1<i<n. Let s,t € V. By Paths(s,t) we denote the set of all paths from s to t. 
Let m € Paths(s,t) be a path in N such that 7 = vjv2...Un. An edge is a pair 
of nodes (v,v’) € V x V such that C(v, v’) > 0. We write (v,v’) € m whenever 
(v, v’) = (vi, vi+1) for some i, 1 <i<n. 

Routes in an NCD are traditionally determined by annotating the links with 
weights and employing shortest path routing (e.g. ECMP). In case of multiple 
shortest paths, traffic engineers select either one of the shortest paths or decide 
to split the flow among the different shortest paths for load-balancing purposes. 
When one or multiple links fail, the set of shortest paths may change and the 
routes need to be updated. The weight assignment is usually provided by the 
network operators and is primarily used for traffic engineering purposes. 


Definition 2 (Weight Assignment). Let N = (V,C,D) be an NCD. A weight 
assignment on N is a function W : V x V œ> NU {co} that assigns each link a 
positive weight where C(v,v') = 0 implies that W (v, v') = œ for allv,v' E V. 


Assume now a fixed weight assignment for a given NCD N = (V, C, D). Let 
T = V1{V2°++Un E VT be a path from vı to vn. The weight of the path m is 
denoted by W (r) and defined by W (r) = 30") W (vi, vi41). Let s,t € V. The 
set of shortest paths from s to t is defined by SPaths(s,t) = {a € Paths(s,t) | 
W(r) 4 œ and W(r) < W(z’) for all 7’ € Paths(s,t)}. As the weights are 
positive, all shortest paths in the set SPaths(s,t) are acyclic and hence the set 
is finite (though of possibly exponential size). 

For a given NCD N and a set of failed links F, we can now define the NCD 
NF where all links from F are removed. 


Definition 3. Let N = (V,C, D) be an NCD with weight assignment W, and let 
F CVxV bea set of failed links. We define the pruned NCD NF = (V,C¥,D) 
with an updated weight assignment WË by 


— OF (v,v') = C(v,v') and WF (uv, v') = W (v, v’) if (v, v’) g F, and 
— CF (v, v’) =0 and WF (v, v') = œ if (v, 0’) E€ F. 


By Paths” (s,t) and SPaths* (s,t) we denote the sets of the paths and short- 
est paths between s and t in the network N” = (V, C®, D) with W*. 

We shall now define a flow assignment that for each nonempty flow demand 
between s and t and every failure scenario, determines the amount of traffic that 
should be routed through the shortest paths between s and t. 
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Definition 4 (Flow Assignment). A flow assignment f in a capacity network 

= (V,C,D) with weight assignment W and with the set FC V x V of failed 
links is a family of functions fë, : SPaths" (s,t) +» [0,1] for all s,t € V 
where D(s,t) > 0 such that >) <sPaths® (st) JET) = 1. A flow assignment f 


is nonsplittable if fE,(m) € {0,1} for all s,t € V and all r € SPaths* (s,t). 
Otherwise the flow assignment is splittable. 


The notation [0,1] denotes the interval of all rational numbers between 0 
and 1 and it determines how the load demand between the nodes s and t is split 
among the routing paths between the two nodes. A nonsplittable flow assignment 
assigns the value 1 to exactly one routing path between any two nodes s and t. 
If for a given failure scenario F there is no path between s and t for two nodes 
with D(s,t) > 0, then there is no flow assignment as the network is disconnected. 


Definition 5. An NCD N = (V,C,D) is connected for the set of failed links 
FCV xV if SPaths (s,t) 40 for every s,t € V where D(s,t) > 0. 


For a connected NCD, we now define a feasible flow assignment that avoids 
congestion: the sum of portions of flow demands (determined by the flow assign- 
ment) that are routed through each link, may not exceed the link capacity. 


Definition 6 (Feasible Flow Assignment). Let N = (V,C,D) be an NCD 
with weight assignment W. Let F C V x V be the set of failed links s.t. the 
network remains connected. A flow assignment f is feasible if every link (v, v’) € 
V x V with C(v,v') > 0 satisfies $, stev FE) + D(s,t) < C(v, v’). 
nESPaths” (s,t) 
(v, v JET 
We consider four different variants of the capacity problem. 


Definition 7 (Pessimistic Splittable/Nonsplittable (PS/PN)). Given an 
NCD N with a weight assignment and nonnegative integer k, is it the case that 
for every set F of failed links of cardinality at most k, the network remains 


connected and every splittable/nonsplittable flow assignment on N with the set 
F of failed links is feasible? 


Definition 8 (Optimistic Splittable/Nonsplittable (OS/ON)). Given an 
NCD N with a weight assignment and a nonnegative integer k, is there a feasible 
splittable/nonsplittable flow assignment on N for every set of failed links F of 
cardinality at most k? 


A positive answer to the PN capacity problem implies 
positive answers to both PS and ON problems. A positive Pa K 
answer to either the PS or ON problem implies a positive 

answer to the OS problem. This is summarized in Figure 3 DS 2 
and it is easy to argue that the hierarchy is strict. 


. F A i Fig. 3: Hierarchy 
3 Analysis of Algorithmic Complexity 


We now provide the arguments for the upper and lower bounds from Figure 2. 
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Algorithm 1 Computation of the shortest path graph function spg®* 
Input: NCD N = (V,C,D), weight assignment W and s,t € V 
Output: Shortest path graph function spg** : V x V — {0,1} 
if dist(s,t) = œ then spg*"(v,v’) := 0 for all v, v’ € V 
else 
for v,v' € V do 
if dist(s,t) = dist(s,v) + W (v, v’) + dist(v’,t) then spg**(v,v’) := 1 
else spg®™* (v, v’) := 0 


return spg°* 


Complexity Upper Bounds. We present first a few useful observations. Be- 
cause network connectivity can be checked independently for each source s and 
target t where D(s,t) > 0 by computing the maximum flow [14] between s and 
t, we obtain the following lemma. 


Lemma 1. Given an NCD N = (V,C, D) and a nonnegative integer k, it is 
polynomial-time decidable if N remains connected for all sets of failed links F C 
V xV where |F| < k. 


Next, we present an algorithm that for an NCD N = (V,C,D) with the 
weight assignment W : V x V + NU {co} and a given pair of nodes s,t € V 
computes in polynomial time the function spg®* : V x V > {0,1} that assigns 
the value 1 to exactly all edges that appear on at least one shortest path (w.r.t. 
to the weight assignment W) between s and t. The edges that get assigned the 
value 1 hence form the shortest path subgraph between s and t. The algorithm 
uses the function dist(v, v’) that for every two nodes v, v’ € V returns the length 
of the shortest path (again w.r.t. to the assignment W) from v to v’ and if v and 
v’ are not connected then it returns oo. Such an all-pairs shortest path function 
can be precomputed in polynomial time using e.g. the Johnson’s algorithms [27]. 
The function spg*" is defined by Algorithm 1. 


Lemma 2. Let N = (V,C,D) be an NCD with weight assignment W and 
s,t € V. Algorithm 1 runs in polynomial time and the value of spg**(v,v') can 
be returned in nondeterministic logarithmic space. Moreover, there is an edge 
(v, v) E T for some n € SPaths(s,t) iff spg*'(v, v’) = 1. 


We first present results for k = 0 (no link failures) and start by showing 
that the optimistic splittable variant of the capacity problem is decidable in 
polynomial time by reducing it to the feasibility of a linear program. Let N = 
(V,C, D) be an NCD with weight assignment W and let spg®t be precomputed 
for all pairs of s and t. We construct a linear program over the variables x*"‘(v, v’) 
for all s,t,v,v’ € V where the variable x°*(v, v’) represents the percentage of the 
total demand D(s,t) between s and t that is routed through the link (v, v’). In 
the equations below, we let s and ¢ range over all nodes that satisfy D(s,t) > 0. 
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1>2°*(v,v')>0 fors,t,v,v EV (1) 
5 x*"(s,v)-spg®*(s,v)=1 fors,teV (2) 
vEV 
5 x° (v, t): spg**(v,t) =1 fors,tEV (3) 
vEV 
XO rtt (u,v) + spg**(v',v) = 
v'EV 
5 xt (v, v). spg” (v, v) for s,t,v E V,v Æ {s,t} (4) 
v'EV 
x x**(v, v") - spg**(v, v) - D(s,t) < C(v,v') for v,v' EV (5) 


Equation 1 imposes that the flow portion on any link must be between 0 
and 1. Equation 2 makes sure that portion of the demand D(s,t) must be split 
along all outgoing links from s that belong to the shortest path graph. Similarly 
Equation 3 guarantees that the flows on incoming links to ¢ in the shortest 
path graph deliver the total demand. Equation 4 is a flow preservation equation 
among all incoming and outgoing links (in the shortest path graph) connected 
to every node v. The first four equations define all possible splittings of the flow 
demands for all s and t such that D(s,t) > 0. Finally, Equation 5 checks that 
for every link in the network, the total sum of the flows for all s-t pairs does not 
exceed the link capacity. The size of the constructed system is quadratic in the 
number of nodes and its feasibility, that can be verified in polynomial time [39], 
corresponds to the existence of a solution for the OS problem. 


Theorem 1. The OS capacity problem without any link failures is decidable in 
polynomial time. 


If we now restrict the variables to nonnegative intergers, we get an instance 
of integer linear program where feasibility checking is NP-complete [39], and 
corresponds to the solution for the nonsplittable optimistic problem. 


Theorem 2. The ON capacity problem without any link failures is decidable in 
nondeterministic polynomial time. 


Next, we present a theorem stating that both the splittable and nonsplittable 
variants of the pessimistic capacity problem are decidable in polynomial time and 
in fact also in nondeterministic logarithmic space (the complexity class NL). 


Theorem 3. The PS and PN capacity problems without any link failures are 
decidable in nondeterministic logarithmic space. 


Proof. Let N = (V,C,D) be a given NCD with a weight assignment W. Let us 
consider the shortest path graph represented by spg*' as defined by Algorithm 1. 
Clearly, if the set SPaths(s,t) for some s,t € V where D(s,t) > 0 is empty, the 
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answer to both the splittable and nonsplittable problem is negative. Otherwise, 
for each pair s,t € V where D(s,t) > 0, the entire demand D(s,t) can be routed 
(both in the splittable and nonsplittable case) through any edge (v,v’) that 
satisfies spg**(v,v’) = 1. Hence we can check whether for every edge (v,v’) € 
V x V it holds 

5 D(s,t) - spg?” (v, v) < Clv,v’) . 

s,tEV 

D(s,t)>0 

If this is the case, then the answer to both the splittable and the nonsplittable 
pessimistic problem is positive as there is no flow assignment that can exceed 
the capacity of any link. On the other hand, if for some link (v, v’) the sum of all 
demands that can be possibly routed through (v,v’) exceeds the link capacity, 
the answer to the problem (both splittable and nonsplittable) is negative. The 
algorithm can be implemented to run in nondeterministic logarithmic space. 


Let us now turn our attention to the four variants of the problem under 
the assumption that up to k links can fail (where k is part of the input to the 
decision problem). Given an NCD N = (V,C,D) with a weight assignment W, 
we are asked to check, for all (exponentially many) failure scenarios F C V x V 
where |F| < k, whether the pruned NCD N” with the weight assignment WF (as 
defined in Definition 3) satisfies that the network N” is connected and every flow 
assignment is feasible (in case of the pessimistic case) or there exists a feasible 
flow assignment (in case of the optimistic case). As these problems are decidable 
in polynomial time for PN, PS and OS, we can conclude that the variants of the 
problems with failures belong to the complexity class co-NP: for the negation of 
the problems we can guess the failure scenario F for which the problem does not 
have a solution—this can be verified in polynomial time by Theorems 1 and 3. 


Theorem 4. The PN, PS and OS problems with link failures are in co-NP. 


Finally, the same arguments can be used also for the optimistic nonsplittable 
problem with failures. However, as deciding the ON problem without failures 
is solvable only in nondeterministic polynomial time, the extra quantification 
of all failure scenarios means that the problem belongs to the class HË on the 
second level of the polynomial hierarchy [33]. This complexity class is believed 
to be computationally more difficult than the problems on the first level of the 
hierarchy (where the NP and co-NP problems belong to). 


Theorem 5. The ON problem with link failures is in the complexity class TÈ . 


Complexity Lower Bounds. We now prove the complexity lower bounds. 


Theorem 6. The OS capacity problem without any link failures is P-hard under 
NC-reducibility. 


Proof sketch. By NC-reduction from the P-complete maximum flow problem for 
directed acyclic graphs [35]: given a directed acyclic graph G with nonnegative 
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edge capacities, two nodes s and t and a number m, is there a flow between s 
and t that preserves the capacity of all edges and has the volume of at least 
m? This problem can be rephrased as our OS problem by setting the demand 
D(s,t) = m and defining a weight assignment so that every relevant edge in 
G is on some shortest path from s to t. This can be achieved by topologically 
sorting the nodes (in NC? {11,12]) and assigning the weights accordingly. 


Theorem 7. The PS/PN problems without any link failures are NL-hard. 


Proof sketch. Follows from NL-hardness of reachability in digraphs [33]. 


Next, we show that the ON problem is NP-hard, even with no failures. 


Theorem 8. The ON capacity problem without any link failures is NP-hard, 
even for the case where all weights are equal to 1. 


Proof. By a polynomial-time reduction from the NP-complete problem CNF- 
SAT [33]. Let y = c1 Acg A... A Cn be a CNF-SAT instance where every clause 
ci, L< i < n, is a disjunction of literals. A literal is either a variable 71,..., £k 
or its negation %,...,%,. If a literal 4; € {x;,%;} appears in the disjunction 
for the clause c;, we write 4; E€ c;. A formula y is satisfiable if there is an 
assignment of the variables 71,...,xzz to true or false, so that the formula ¢ is 
satisfied (evaluates to true under this assignment). For a given formula y we 
now construct an NCD N = (V, C, D) where 

= V = {80,81,.-., Sk} U {a1,..., an} U{%1,..., TR} U{G, 6 | l<i<n}, 

= C(si—1, £i) = Cl sini, Ti) = C (zi, si) = C (Ti, si) =n for all i, 1 < i < k, 

— C(ce}, Lj) = C(s;j,cf) = 1 for all i, 1 < i < n and every literal 4; € {£}, Zj} 

such that 4; € c;, 
— D(so, sk) = n, and D(cf, cf) = 1 for allt, l<i<n. 


4I 


The capacities of edges and flow demands that are not mentioned above are 
all set to 0 and the weights of all edges are equal to 1. In Figure 4a we give 
an example of the reduction for a given satisfiable formula. As we consider the 
nonsplittable problem, the flow demand from so to sk means that the whole 
demand of n units must go through either the link (xi, si) or (Ti, si), for every 
i. This corresponds to choosing an assignment of the variables to true or false. 
For every clause c; we now have a unit flow from c? to cf that goes through the 
link (€;, sj) for every literal 4; appearing in the clause c;. This is only possible if 
this link is not already occupied by the flow demand from so to sk; otherwise we 
exceed the capacity of the link. For each clause c; we need to find at least one 
literal 4; so that the flow can go through the edge (4;, sj). As the capacity of the 
edge (lj, sj) is n, it is possible to use this edge for all n clauses if necessary. We 
can observe that the capacity network can be constructed in polynomial time 
and we shall argue for the correctness of the reduction. 

We can now observe that if y is satisfiable, we can define a feasible flow 
assignment f by routing the flow demand of n between so and sọ so that it does 
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53 53 
(a) NCD for the formula (xı V z3) A (b) Additional construction for the for- 
(xı V Z2 V z3). The capacity of un- mula Yy1, y2. 301, £2, £3. (£1 V £3 V yı V 
labelled links is 1, otherwise 2; link Jı V Yo) A (z1 V T2 V T3 V y2). Capacity 
weights are 1. Thick lines show a fea- of all links is 4 and weight of links is 1. 
sible nonsplittable flow assignment. Double arrows are 2-unbreakable links. 


p 
v = y represents v “a 
ÒN 


(c) Definition of m-unbreakable link of capacity n with m + 1 intermediate nodes 


Fig. 4: Reduction to ON capacity problem without/with failures 


not use the links corresponding to the satisfying assignment for y and then every 
clause in y can be routed through the links corresponding to one of the satisfied 
literals. For the other direction where y is not satisfieable, we notice that any 
routing of the flow demand between sọ and są (corresponding to some truth 
assignment of y) leaves at least one clause unsatisfied and it is then impossible 
to route the flow for such a clause without violating the capacity constraints. 


We now extend the reduction from Theorem 8 to the OS case with link 


failures and prove its hardness for the second level of the polynomial hierarchy. 
Theorem 9. The ON problem with link failures is ITP -hard. 


Proof. By reduction from the validity of the quantified Boolean formula of the 
form Vy}, Y2,---;Ym- 41, L2, .-., Lk. P where Y = c1 A C2 A... A Cn is a Boolean 
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formula in CNF over the variables y1,...,Y%m,21,---,%- The validity problem 
of such quantified formula is M? -hard (see e.g. [33]). For a given quantified for- 
mula, we shall construct an instance of the ON problem such that the formula 
is valid if and only if the ON problem with up to m link failures (where m is the 
number of y-variables) has a positive answer. The reduction uses the construc- 
tion from Theorem 8 where we described a reduction from the validity of the 
formula 421, £2,..., k. P. The construction is further enhanced by introducing 
new nodes y;,¥;,e; and new edges of capacity 2n (where n is the number of 
clauses) such that C(y;,9;) = Cj, €j) = 2n, for all i, 1 < j < m. 

Now for every clause c; we add the so-called m-unbreakable edge of capacity 
n from cî to yj and from e; to cf for all 1 < i < n and 1 < j < m. Moreover, 
whenever the literal yj appears in the clause c;, we also add an m-unbreakable 
edge from p; to c; and whenever the literal 7; appears in the clause c;, we add 
m-unbreakable edge from cf to yj. The construction of m-unbreakable edges 
(denoted by double arrows) is given in Figure 4c where the capacity of each link 
is set to n. Finally, for each j, 1 < j < m, we add the unbreakable edges from 
sı to yj and from e; to sz. The flow demands in the newly constructed network 
are identical to those from the proof of Theorem 8 and the weights of all newly 
added edges are set to 1 and we set the weight of the two links sọ to x; and 
So to Tı to 6. The reduction can be clearly done in polynomial time. Figure 4b 
demonstrates an extension of the construction from Figure 4a with additional 
nodes and links that complete the reduction. Observe, that even in case of m 
link failures, the unbreakable links that consist of m + 1 edge disjoint paths are 
still capable of carrying all the necessary flow traffic. 

We shall now argue that if the formula Vy, y2,.--,Ym- IL1, T2;..., Ek. Y is 
valid then the constructed instance of the ON problem with up to m link failures 
has a solution. We notice that any subset of up to m failed links either breaks 
exactly one of the newly added edges (yj, yj) and (y;,e;) for all j, 1 < j < m, in 
which case this determines a valid truth assignment for the y-variables and as 
in the previous proof, the flow from sọ to są can now be routed so that for each 
clause there is at least one satisfied literal. Otherwise, there is a variable y; such 
that both of the edges (yj, Y;) and (Yj, ej) are present and all flow demands can 
now be routed through these two edges (that have sufficient capacity for this) 
by using the m-unbreakable edges. The opposite direction where the formula 
is not valid means that there is a truth assignment to the y-variables so that 
irrelevant of the assignment for xz-variables there is at least one clause that is 
not satisfied. We simply fail the edges that correspond to such a y-variables 
assigment and the same arguments as in the previous proof imply that there is 
not any feasible flow assignment for this failure scenario. 


Theorem 10. The PN, PS and OS problems with link failures are co-NP-hard. 


Proof sketch. By reduction from the NP-complete shortest path most vital edges 
problem (SP-MVE) [3,36]. The input to SP-MVE is a directed graph G = (V, E) 
with positive edge weights, two nodes s,t € V and two positive numbers k and 
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Algorithm 2 Brute-force search 
1: Input: NCD N = (V,C,D) with weigth assignment W, a number k > 0 and type 
of the capacity problem 7 € {PS, PN, ON, OS} 
Output: true if the answer to the t-problem is positive, else false 
for all FC V x V s.t. |F| < k and C(v,v’) > 0 for all (v, v’) € F do 
construct network N” and weight assignment W” by Definition 3 
switch 7 do 
case OS: use Theorem 1 on N” and W” (without failed links) 
case ON: use Theorem 2 on N” and W” (without failed links) 
case PS/PN: use Theorem 3 on N” and W” (without failed links) 


if the answer to the t-problem on N” and W” is negative then return false 


: endfor 
: return true 


=.. 


H. The question is whether there exist at most k edges in E such that their 
removal creates a graph with the length of the shortest path between s and t 
being at least H. We reduce the SP-MVE to the negation of the PN/PS in order 
to demonstrate co-NP-hardness. 

We modify the G by inserting a new edge between s and t of weight H and 
capacity 1, while setting the capacity 2 for all other edges in G. If the SP-MVE 
problem has a solution F C E where |F| < k, then the added edge (s, t) becomes 
one of the shortest paths between s and t under the failure scenario F and a flow 
demand of size 2 between s and ¢ can be routed through this edge, violating the 
capacity constraints. If the SP-MVE problem does not have a solution, then after 
the removal of at most k links, the length of the shortest path between s and t 
remains strictly less than H and any flow assignment along the shortest paths 
is feasible. We hence conclude that PN/PS problems are co-NP-hard. A small 
modification of the construction is needed for hardness of the OS problem. 


4 A Fast Strategic Search Algorithm 


In order to solve the PS, PN, ON and OS problems, we can enumerate all 
failure scenarios for up to k failed links (omitting the links with zero capacity), 
construct the pruned network for each such failure scenario and then apply our 
algorithms in Theorems 1, 2 and 3. This brute-force search approach is formalized 
in Algorithm 2 and its worst-case running time is exponential. 

Our complexity results indicate that the exponential behavior of any algo- 
rithm solving a co-NP-hard (or even I/?-hard) problem is unavoidable (unless 
P=NP). However, in practice many concrete instances can be solved fast if more 
refined search algorithms are used. To demonstrate this, we present a novel 
strategic search algorithm for verifying the feasibility of shortest path routing 
under failures. At the heart of our algorithm lies the idea to reduce the number 
of explored failure scenarios by skipping the “uninteresting” ones. Let us fix an 
NCD N = (V,C,D) with the weight assignment W. We define a relation < on 
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failure scenarios such that F < F” iff for all flow demands we preserve in F” at 
least one of the shortest paths that are present under the failure scenario F. 


Definition 9. Let F,F' €V xV. We say that F preceeds F", written F < F”, 
if SPaths" (s,t) D SPaths" (s,t) and SPaths* (s,t)M SPaths” (s,t) #0 for all 
s,t € V where D(s,t) > 0. 


We first show that if F < F” and the failure scenario F has a feasible routing 
solution for the pessimistic problem, then F” also has a solution. Thus instead 
of exploring all possible failure scenarios like in the brute-force algorithm, it is 
sufficient to explore only failure scenarios that are minimal w.r.t. < relation. 


Lemma 3. Let F,F’ © V xV where F < F'. A positive answer to the PS/PN 
problem for the network NF with weight assignment W® implies a positive an- 
swer to the PS/PN problem for the network NË with weight assignment W® . 


For the optimistic scenario, the implication is valid in the opposite direction: 
it is sufficient to explore only the maximum failure scenarios w.r.t. ~. 


Lemma 4. Let F,F’ © V xV where F < F'. A positive answer to the OS/ON 
problem for the network N® with weight assignment W® implies a positive 
answer to the OS/ON problem for the network NË with weight assignment W* . 


Hence for the pessimistic scenario, the idea of strategic search is to ignore 
failure scenarios that remove only some of the shortest paths but preserve at least 
one of such shortest paths. For the optimistic scenario, we on the other hand 
explore only the maximal failure scenarios where removing one additional link 
causes the removal of all shortest paths for at least one source and destination. 

In our algorithm, we use the notation spg? for the shortest path graph as 
defined in Algorithm 1 for the input graph NF with weight assignment W”’. The 
function min_cuts(spg* , s,t) returns the set of all minimum cuts separating the 
nodes s and t (sets of edges that disconnect the source node s from the target 
node t in the shortest-path graph spg% ). This function can be computed e.g. 
using the Provan and Shier algorithm [34], assuming that each edge has a unit 
weight and hence minimizing the number of edges in the minimum cut. There 
can be several incomparable minimum cuts (with the same number of edges) and 
by mincut_size(spg%", s,t) we denote the number of edges in each the minimum 
cuts from the set min_cuts(spg‘', s,t). 

Algorithm 3 now presents our fast search strategy, called strategic search. 
The input to the algorithm is the same as for the brute-force search. The algo- 
rithm initializes the pending set of failure scenarios to be explored to the empty 
failure scenario and it remembers the set of passed failure scenarios that were 
already verified. In the main while loop, a failure scenario F is removed from the 
pending set and depending on the type 7 of the problem, we either directly verify 
the scenario F in the case of the pessimistic problems, or we call the function 
MazxFailureCheck(F) that instead verifies all maximal failure scenarios F” such 
that F < F’. The correctness of Algorithm 3 is formally stated as follows. 


Theorem 11. Algorithm 3 terminates and returns true iff the answer to the 
T-problem is positive. 
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Algorithm 3 Strategic search 


1: Input: NCD N = (V,C,D) with weigth assignment W, a number k > 0 and type 
of capacity problem 7 € {PS, PN, ON, OS} 


2: Output: true if the answer to the t-problem is positive, else false 

3: pending := {0} \* initialize the pending set with the empty failure scenario *\ 
4: passed := 0 \* already processed failure scenarios *\ 
5: while pending £ 0 do 

6: let F € pending; pending := pending \ {F} 

T: switch 7 do 

8: case r € {PS, PN}: Build N” and W” by Definition 3, use Theorem 3 

9: if the answer to the t-problem was negative then return false 

10: case T € {OS, ON}: call MazxFailureCheck(F) 

11: passed := passed U {F} 

12: for s,t € V such that D(s,t) > 0 do 

13: if |F| + mincut_size(spg%", s,t) < k then 

14: succ := {FUC | C € min_cuts(spg%", s,t), FUC ¢ (pending U passed) } 
15: pending := pending U succ 

16: endwhile 

17: return true 

18: 

19: procedure MazFailureCheck(F) \* to be run only for the optimistic cases *\ 
20: for s,t € V such that D(s,t) > 0 do 


21: for C € min_cuts(spg%", s,t) do 


22: for all C’ C C such that |F U C'| = min(k, |F U C| — 1) do 

23: if F UC” ¢ passed then 

24: construct NFU°’ and WFUO" by Definition 3 

25: switch 7 do 

26: case T = OS: use Theorem 1 and if negative then return false 
27: case T = ON: use Theorem 2 and if negative then return false 
28: passed := passed U {F UC"} 

29: endfor 

30: endfor 

31: endfor 


5 Experiments 


To evaluate the practical performance of our strategic search algorithms, we con- 
ducted experiments on various wide-area and datacenter network topologies. The 
reproducibility package with our Python implementation can be found at [37]. 
We study the algorithms’ performance on a range of network topologies, 
and consider both sparse and irregular wide-area networks (using the Internet 
Topology Zoo [28] data set) as well as dense and regular datacenter topologies 
(namely fat-tree [9], BCube [23], and Xpander [40]). To model demands, for 
each topology, we consider certain nodes to serve as core nodes which have 
significant pairwise demands. Overall, we created 24,388 problem instances for 
our experimental benchmark, out of which we were able to solve 23,934 instances 
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Topology ||/Problem|B.iter| B.time|S.iter/S.time|/Speedup 
BCube ON 105 79.5 1 1.7 47.1 
BCube OS 2081| 348.2) 768| 125.1 2.8 
BCube PS/PN 5051} 170.0 1 0.1 4684.0 
Fat-tree ON 105 59.4 1 1:2 47.6 
Fat-tree OS 41 2.0 1 0.2 8.5 
Fat-tree PS/PN | 43745) 562.6 1 0.1]| 66976.3 
Xpander ON 254) 407.3 1 3.0 137.7 
Xpander OS 170} 124.1 1 1.6 78.0 
Xpander PS/PN -|>7200.0 1 5.4]| >1340.6 

Topology Zoo ON 127 59.6 8 4.6 12.9 
Topology Zoo OS 596 35.3 46 2.6 13.4 
Topology Zoo}|| PS/PN 86 4.3 2 0.1 82.7 


Fig. 5: Median results, time in seconds (B: brute-force search, S: strategic search) 


within a 2-hour timeout. In our evaluation, we filter out the trivial instances 
where the runtime is less than 0.1 second for both the brute-force and strategic 
search (as some of the instances e.g. contain a disconnected flow demand already 
without any failed links). The benchmark contains a mixture of both positive 
and negative instances for each problem for increasing number k of failed links. 

Table 5 shows the median times for each series of experiments for the different 
scenarios. All experiments for each topology and given problem instance are 
sorted by the speedup ratio, i.e. B.time divided by S.time; we display the result 
for the experiment in the middle of each table. Clearly, our strategic search 
algorithm always outperforms the brute-force one by a significant factor in all 
the scenarios. We also report on the number of iterations (B.iter and S.iter) of 
the two algorithms, showing the number of failure scenarios to be explored. 

Let us first discuss the pessimistic scenarios in more detail. Figure 6 shows a 
cactus plot [6] for the wide-area network setting (on the left) and for the data- 
center setting (on the right). We note that y-axis in the figure is logarithmic. For 
example, to solve the 1500th fastest instances in the wide-area network (left), 
the brute-force algorithm uses more than 100 seconds, while the strategic algo- 
rithm solves the problem in less than a second; this corresponds to a speedup of 
more than two orders of magnitude. For more difficult instances, the difference 
in runtime continues to grow exponentially, and becomes several orders of mag- 
nitude. For datacenter networks (right), the difference is even larger. The latter 
can be explained by the fact that datacenters provide a higher path diversity 
and multiple shortest paths between source and target nodes and hence more op- 
portunities for a clever skipping of “uninteresting instances”. As the pessimistic 
problems we aim to solve are co-NP-hard, there are necessarily some hard in- 
stances also for our strategic search; this is demonstrated by the S-shaped curve 
showing a significantly increased runtime for the most difficult instances. 

We next discuss the optimistic scenarios, including the experiments both for 
splittable and nonsplittable cases. Figure 7 shows a cactus plot for the wide-area 
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Fig. 6: Pessimistic scenario. Left: wide-area networks, right: datacenter networks 
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Fig. 7: Optimistic scenario. Left: wide-area networks, right: datacenter networks 


network setting (on the left) and for the datacenter setting (on the right). Again, 
our strategic algorithm significantly outperforms the baseline in both scenarios. 
Interestingly, in the optimistic scenario, the relative performance benefit is larger 
for wide-area networks as the optimistic strategic search explores all the maxi- 
mum failure scenarios and there are significantly more of such scenarios in the 
highly connected datacenter topologies. Hence, while for datacenters (right) the 
strategic search maintains about one order of magnitude better performance, 
the performance for the wide-area networks improves exponentially. 


6 Conclusion 


We presented a comprehensive study of the algorithmic complexity of verifying 
feasible routes under failures without violating capacity constraints, covering 
both optimistic and pessimistic, as well as splittable and nonsplittable scenarios. 
We further presented algorithms, based on strategic failure scenario enumera- 
tions, which we proved efficient in realistic scenarios. While our paper charts the 
complete landscape, there remain several interesting avenues for future research 
like further scalability improvements and a parallelization of the algorithm. 


Acknowledgements. Research supported by the Vienna Science and Technology 
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Abstract. Writing classification rules to identify interesting network 
traffic is a time-consuming and error-prone task. Learning-based classi- 
fication systems automatically extract such rules from positive and neg- 
ative traffic examples. However, due to limitations in the representation 
of network traffic and the learning strategy, these systems lack both ex- 
pressiveness to cover a range of applications and interpretability in fully 
describing the traffic’s structure at the session layer. This paper presents 
Sharingan system, which uses program synthesis techniques to generate 
network classification programs at the session layer. Sharingan accepts 
raw network traces as inputs and reports potential patterns of the target 
traffic in NetQRE, a domain specific language designed for specifying 
session-layer quantitative properties. We develop a range of novel op- 
timizations that reduce the synthesis time for large and complex tasks 
to a matter of minutes. Our experiments show that Sharingan is able 
to correctly identify patterns from a diverse set of network traces and 
generates explainable outputs, while achieving accuracy comparable to 
state-of-the-art learning-based systems. 


Keywords: Program synthesis - Network traffic analysis - Supervised 
learning. 


1 Introduction 


Network monitoring systems are essential for network infrastructure manage- 
ment. These systems require classification of network traffic at their core. Today, 
network operators and equipment vendors write classification programs or pat- 
terns upfront in order to differentiate target flows such as attacks or undesired 
application traffic from normal ones. The process of writing these classification 
programs often requires deep operator insights, can be error prone, and is not 
easy to extend to handle new scenarios. 

There have been a number of recent attempts at automated generation 
of classifiers for malicious traffic using machine learning[16,38,5,12] and data 
mining|6,28,34,39,19] techniques. These classifiers have not gained much traction 
in production systems, in part due to unavoidable false positive reports and the 
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gap between the learning output and explainable operational insights[31]. The 
challenges call for a more expressive, interpretable and maintainable learning- 
based classification system. 

To be specific, such challenges first come from the extra difficulties learning- 
based systems face in network applications compared to traditional use cases 
such as recommendation systems, spam mail filtering or OCR [31]. Misclassi- 
fications in network systems have tangible cost such as the need for operators 
to manually verify potential false reports. Due to the diverse nature and large 
data volumes of networks in production environments, entirely avoiding these 
costly mistakes by one training stage is unlikely. Therefore explainability and 
maintainability plays a core role in a usable learning system. 

Properly representing network traffic and learnt patterns is another major 
difficulty. As a data point for classification purposes, a network trace is a se- 
quence of packets of varying lengths listed in increasing timestamp order. Ex- 
isting approaches frequently compress it into a regular expression or a feature 
vector for input. Such compression will eliminate session-layer details and inter- 
mediate states in network protocols, making it hard to learn application-layer 
protocols or multi-stage transactions. These representations also require labo- 
rious task-specific feature engineering to get effective learning results, which 
undermines the systems’ advantages of automation. It can also be hard to in- 
terpret the learning results to understand the intent and structure of the traffic, 
due to the blackbox model of many machine-learning approaches and the lack 
of expressiveness in the inputs and outputs to these learning systems. 

To address the above limitations, we introduce Sharingan, which uses pro- 
gram synthesis techniques to auto-generate network classification programs from 
labeled examples of network traffic traces. Sharingan aims to bridge the gap be- 
tween learning systems and operator insights, by identifying properties of the 
traffic that can help inform the network operators on its nature, and provide a ba- 
sis for automated generation of the classification rules. Sharingan does not aim 
to outperform state-of-the-art learning systems in accuracy, but rather match 
their accuracy, while generating output that is more explainable and easier to 
maintain. 

To achieve these goals, we adopt techniques from syntax guided program syn- 
thesis [1] to generate a NetQRE [37] program that distinguishes the positive and 
negative examples. NetQRE, which stands for Network Quantitative Regular 
Expressions, enables quantitative queries for network traffic, based on flow-level 
regular pattern matching. Given an input network trace, a NetQRE program 
generates a numerical value that quantifies the matching of the trace with the 
described pattern. The classification is done by comparing the synthesized pro- 
gram’s output for each example with a learnt threshold T. Positive examples fall 
above T. The synthesized NetQRE program serves the role of network classifier, 
identifying flows which match the program specifications. 

Sharingan has the following key advantages over prior approaches, which 
either rely on keyword and regular expression generation [6,28,34,39,19] or sta- 
tistical traffic analysis [16,38,5,12]. 
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Requires minimal feature engineering: NetQRE [37] is an expressive lan- 
guage, and allows succinct description of a wide range of tasks ranging from 
detecting security attacks to enforcing application-layer network management 
policies. Sharingan can synthesize any network task on raw traffic expressible 
as a NetQRE program, without any additional feature engineering. This is an 
improvement over systems based on manually extracted feature vectors. Also, 
one outstanding feature of search-based program synthesis is that the only a pri- 
ori knowledge it needs is information about the language itself. No task-specific 
heuristics are required. 

Efficient implementation: The NetQRE program synthesized by Sharingan 
can be compiled, as has been shown in prior work [37], to efficient low-level 
implementations that can be integrated into routers and other network devices. 
On the other hand, traditional statistical classifiers are not directly usable or 
executable in network filtering systems. 

Easy to decipher and edit: Finally, Sharingan generates NetQRE programs 
that can be read and edited. Since they are generic executable programs with 
high expressiveness, the patterns in the program reveal the stateful protocol 
structure that is used for the classification, which blackbox statistical models, 
packet-level regular expressions and feature vectors have difficulty describing. 
The programs are also amenable to calibration by a network operator, for ex- 
ample, to mix in local policies or debug. 

The key technical challenge in design and implementation of Sharingan is 
the computationally demanding problem of finding a NetQRE expression that is 
able to separate positive network traffic examples from the negative ones. This 
search problem is an instance of the syntaz-guided synthesis. While this problem 
has received a lot of attention in recent years, no existing tools and techniques 
can solve the instances of interest in our context due to the unique semantics 
of NetQRE programs, the complexity of the expressions to be synthesized and 
the scale of the data set of network traffic examples used in training. To address 
this challenge, we devised two novel techniques for optimizing the search — par- 
tial execution and merge search, which effectively achieve orders of magnitude 
reduction in synthesis time. We summarize our key contributions: 
Synthesis-based classification architecture. We propose the methodology 
of reducing a network traffic classification problem to a synthesis from examples 
instance. 

Efficient synthesis algorithm We devise two efficient algorithms: partial 
execution and merge search, which efficiently explore the program space and 
enable learning from very large data sets. Independent of our network traffic 
classification use cases, these algorithms advance the state-of-the-art in program 
synthesis. 

Implementation and evaluation. We have implemented Sharingan and eval- 
uated it for a rich set of metrics using the CICIDS2017 [25,7] intrusion detection 
benchmark database. Sharingan is able to synthesize a large range of network 
classification programs in a matter of minutes with accuracy comparable to 
state-of-the-art systems. Moreover, the generated NetQRE program is easy to 
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interpret, tune, and can be compiled into configurations usable by existing net- 
work monitoring systems. 


2 Overview 


Sharingan’s workflow is largely similar to a statistical supervised learning system, 
although the underlying mechanism is different. Sharingan takes labeled positive 
and negative network traces as input and outputs a classifier that can classify 
any new incoming trace. To preserve most of the information from input data 
and minimize the need for feature engineering, Sharingan considers three kinds 
of properties in a network trace: (1) all available packet-level header fields, (2) 
position information of each packet within the sequence, and (3) time information 
associated with each packet. 

Specifically, Sharingan represents a network trace as a stream of feature vec- 
tors: S = v9,V1,V2,-..-. Each vector represents a packet. Vectors are listed in 
timestamp order. Contents of the vector are parsed field values of that packet. 
For example, we can define 

v[0] = ip.src, v[1] = tcp.sport, v[2] = ip.dst,.... 

Depending on the information available, different sets of fields can be used 
to represent a packet. By default, we extract all header fields at the TCP/IP 
level. To make use of the timestamp information, we also append time interval 
since the previous packet in the same flow to a packet’s feature vector. Feature 
selection is not necessary for Sharingan. 

The output classifier is a NetQRE program p that takes in a stream of feature 
vectors. Instead of giving a probability score that the data point is positive, it 
outputs an integer that quantifies the matching of the stream and the pattern. 
The program includes a learnt threshold T. Sharingan aims to ensure that p’s 
outputs for positive and negative traces fall on different sides of the threshold 
T. Comparing p’s output for a data point with T generates a label. It is possible 
to translate p and T into executable rules using a compilation step. 

Given the above usage model, a network operator can use Sharingan to gen- 
erate a NetQRE program trained to distinguish normal and suspected abnormal 
traffic generated from unsupervised learning systems. The synthesized programs 
themselves, as we will later show, form the basis for deciphering each unknown 
trace. Consequently, traces whose patterns look interesting can be subjected to 
a detailed manual analysis by the network operator. Moreover, the generated 
NetQRE programs can be further refined and compiled into filtering system’s 
rules. 


3 Background on NetQRE 


NetQRE [37] is a high-level declarative language for querying network traffic. 
Streams of tokenized packets are matched against regular expressions and ag- 
gregated by multiple types of quantitative aggregators. The NetQRE language 
is defined by the BNF grammar in Listing 1.1. 
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<classifier>::= <program> > <value> e= 
<program> ::= <group-by> <pred> ::= <pred> && <pred> 
<group-by> ::= (<group-by>)<op>|< | <pred> || <pred> 
feats> | [<feat> == <value>] 
| <qre> | [<feat> >= <value>] 
<qre> ::= (<qre> <qre>)<op> | [<feat> <= <value>] 
| C(<qre>)*<op> | [<feat> -> <prefix>] 
| <unit> <feats> ::= <feat> 
Aims Spe Meraz | <feats>, <feat> 
<Re> =a <hesche> <feat> = Oe R |e 
| C<re>)* <op> = max | min | sum 
| <pred> 


Listing 1.1: NetQRE Grammar 


As an example, if we want to find out if any single source is sending more than 
100 TCP packets, the following classifier based on a NetQRE program describes 
the desired classifier: 

COP Tp type "Teh ]) Y i * sum Yuax |p vsrclip e100 


At the top level, there are two parts of the classifier. A processing program on 
the left that maps a network trace to an output number, and a threshold against 
which this value is compared on the right. They together form the classifier. 
Inputs fall into different classes based on the results of the comparison. 

Group-by expression (<group-by>) splits the trace into sub-flows based on the 
value of the specified field (source IP address in this example): 


(Geren on erence Jnax |(iprsrelip 


Packets sharing the same value in the field will be assigned to the same sub-flow. 
Sub-flows are processed individually, and the outputs of which are aggregated 
according to the aggregation operator (<op>) (maximum in this example). 

In each sub-flow, we want to count the number of TCP packets. This can be 
broken down into three operations: (1) specifying a pattern that a single packet 
is a TCP packet, (2) specifying that this pattern repeats arbitrary number of 
times, and (3) adding 1 to a counter each time this pattern is matched. 

(1) is achieved by a plain regular expression involving predicates. A predicate 
describes properties of a packet that can match or mismatch one packet in the 
trace. Four types of properties frequently used in networks can be described: 


1. It equals a value. For example: [tcp.syn == 1] 

2. It is not less than a value. For example: [ip.len >= 200] 
3. It is not greater a value. For example: [tcp.seq <= 15] 

4. It matches a prefix. For example: [ip.src_ip -> 192.168] 


Predicates combined by concatenation and Kleene-star form a plain regular ex- 
pression, which matches a network trace considered as a string of packets. 

A unit expression indicates that a plain regular expression should be viewed 
as atomic for quantitative aggregation (in this case a single TCP packet): 
pty pee ener, 

It either matches a substring of the trace and outputs the value 1, or does not 
match. 


To achieve (2) and (3), we need a construct to both connect the regular 
patterns to match the entire flow and also aggregate outputs bottom up from 
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units at the same time. We call it quantitative regular expression (<qre>). In this 
example, we use the iteration operator: 
GQ SE petty pera acer sun 


It matches exactly like the Kleene-star operator, and at the same time, for each 
repetition of the sub-pattern, the sub-expression’s output is aggregated by the 
aggregation operator. In this case, the sum is taken, which acts as a counter for 
the number of TCP packets. The aggregation result for this expression will in 
turn be returned as an output for higher-level aggregations. 

The language also supports the concatenation operator: 


(<qre> <qre>)<op> 


which works analogous to concatenation for regular matching. It aggregates the 
quantity by applying the <op> on the outputs of two sub-expressions that match 
the prefix and suffix. 


In addition to this core language, there is a specialization for the synthesis 
purpose. We observe that comparing a field with values that do not appear in any 
of the given examples is expensive but will not produce any meaningful informa- 
tion. Therefore we use the relative position in the examples’ value space instead 
of a specific value, for example, 50% instead of 3 in value space {1,3, 12,15}. 


4 Synthesis Algorithm 


Given a set of positive and negative examples Ep and En, respectively, the goal 
of our synthesis algorithm is to derive a NetQRE program py and a threshold T 
that differentiates Ep apart from En. We start with notations to be used in this 
section: 

Notation. p and q denote individual programs, and P and Q denote sets of 
programs. pı — p2 denotes it is possible to mutate pı following production rules 
in NetQRE’s grammar to get po. The relation — is transitive. We assume the 
starting symbol is always <program>. 

p(x) denotes program p’s output on input x, where x is a sequence of packets 
and p(x) is a numerical value. If p is an incomplete program, i.e., if p contains 
some non-terminals, then p(x) = {q(x)|p — q} is a set of numerical values, 
containing x’s output through all possible programs p can mutate into. We de- 
fine p(x).max to be the maximum value in this set. Similarly, p(a).min is the 
minimum value. 

The synthesis goal can be formally defined as: Ve € E,,ps(e) > T and 
Ve € En, pple) <T. 


4.1 Overview 


Our design needs to address two key challenges. First, NetQRE’s rich grammar 
allows a large possible program space and many possible thresholds for search. 
Second, the need to check each possible program against a large data set collected 
from network monitoring tasks poses scalability challenge to the synthesis. 
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Fig. 1: Synthesizer Overview 


We propose two techniques for addressing these challenges: partial execution 
(Section 4.2) and merge search (Section 4.3). Figure 1 shows an overview of the 
synthesizer. 

The top-level component is the search planner, that assigns search tasks over 
subsets of the entire training data to the enumerator in a divide-and-conquer 
manner. Each such task is a search-based synthesis instance, where the enu- 
merator enumerates all possible programs starting from so, expanded using the 
productions in NetQRE grammar, until one that can distinguish the assigned 
subset of Ep and En is found. 

The enumerator optimizes for the first challenge by querying the distributed 
oracle about each partial program’s feasibility and doing pruning early. The 
oracle evaluates partial programs using partial execution. The search planner 
optimizes for the second challenge by merging search results from subsets of the 
large training data, so as to save unnecessary checking, which we call the merge 
search strategy. 

We next explain each technique in detail in the rest of this section. 


4.2 Partial Execution 


A partial program is an incomplete program with non-terminals. Similar to prior 
work making overestimation on regular expressions and imperative languages 
for early pruning in the search process [14,29,30], we want to evaluate a partial 
NetQRE program for the feasibility of all possible completions of it, so as to 
decide early if any of them can serve as a proper classifier for Ep and En. 

This process includes three main steps: (1) finding an equivalent completion 
p of a partial program p so that evaluating p on any input x is equivalent to 
evaluating the combination of all possible completions of p on x, (2) efficiently 
evaluating p(x), (3) deciding whether to discard p based on the evaluation result. 
Equivalent Completion: Recall that we define p(x) of a partial program p to 
be the union of all g(a) such that p — q. Since we mainly care about outputs 
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of positive and negative examples on different sides of a threshold, the essential 
information is the upper and lower bounds for p(a). Therefore, the criterion for 
finding an equivalent completion is the bounds of p(x) should include p(x) for 
any input zx. 

Many non-terminals have a straightforward equivalent completion. We re- 
place (1) any uncertain numerical value with the largest or smallest possible 
value depending on the context, (2) any unknown predicate with unknown, (3) 
any unknown regular expression with _* and (4) any unknown quantitative reg- 
ular expression with (/_ _*/)*sum. We skip the formal proof of correctness of 
this approach. Intuitively, the first two include all possible values at the position, 
and the latter two include all possible matching and aggregation strategies for a 
trace. 


There are some non-terminals that do not have an equivalent completion, 
such as <group-by> and <op>. While doing enumeration, we put a complexity 
penalty over these non-terminals if they are not expanded, therefore encouraging 
earlier expansion of them so that partial execution is possible. 


Computing Ambiguity: Notice that regular patterns naturally allow multiple 
matching strategies if a character(packet) in the input can match more than 
one predicate in the program, which is why we can estimate a set of NetQRE 
programs by one equivalent completion p. The goal and also the major challenge 
in evaluating p(x) on arbitrary input x is to compute the quantitative outputs 
from all valid matching strategies, which can grow exponentially with the input 
trace’s length. 


sum + 
sum=0 NUL sum = sum = sum = 
concat 
max=0 max=2 max=2 


Fig. 2: Illustration of an unambiguous program. , , 
Predicate A matches packet C’s while predicate Fig. 3: Illustration of the first 3 steps of strategy 
B matches packet D. one when predicate B is not yet explored. 


To solve the problem of too many matching strategies, we use an approxi- 
mation: merging “close” matching strategies. Two strategies are defined to be 
“close” if at some step of their matching process (1) they have matched the same 
number of packets in the trace and (2) the last predicate they have matched is 
exactly the same. We explore all matching strategies simultaneously and do a 
merging whenever two strategies can be identified to be close. Notice that each 
matching strategy maintains a distinct copy of aggregation states for every <qre> 
expression. States for a same expression as well as the final results are merged 
into one interval. 


As an example, Figure 2,3,4,5 illustrates the evaluation process of a partial 
program during the search for the following pattern with CCCCD as input: 


C C /AA/ )*sum ( /B/ )*sum )max 
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sum +[1,; 
sum +[1,; 
sum = [3; 
concat sum = [1,3] 
max = [0,1] max = [3,5] 


Fig. 4: Illustration of the first 3 steps of strategy Fig. 5: Illustration of the last 2 steps of merged 
two strategy one & two 


>) 
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By the properties of interval arithmetic and regular expressions, it can be 
proven that the approximation result strictly contains the true output range. Or 
more formally, p(x).min < p(x).min < p(x).max < p(x).max. 

Intuitively, the proposed evaluation scheme works well because we only care 
about the boundary of outputs, which are represented by intervals as the abstract 
data type. We implement the execution and approximation process by the Data 
Transducer model proposed by [2], which consumes a small constant memory 
and linear time to the input trace’s length given a specific program. 


Make Decision: To make a decision regarding a partial program p, let q be 
a complete program and assume there is only one pair of examples ep and en. 
For q to accept ep and en, there must be a threshold T such that q(en).mazx < 
T < q(ép).min. Therefore, given a pair of examples ep and en, a program q 
is correct if and only if q(e,).max < q(ep).min. When this holds, any value 
between q(e,,).maz and q(e,).min can be used as the threshold. 


Lemma 1: There exists a correct program q such that p > q only if plen )-min < 
p(ep).max 

Lemma 2: If p(e,).maxz < p(e,).min then any program q such that p — q is 
correct. 


From Lemma 1, we can decide if p must be rejected. From Lemma 2, we can 
decide if p must be accepted. These criteria can be extended to more than 1 
pair of examples. We will not give formal proof to the lemmas. Figures 6 and 
7 show two intuitive examples for explanations of the decision making process. 
(but do not necessarily represent properties of real data sets). Each vertical bar 
represents the output range of the corresponding data point produced by the 
program under investigation. 


@Lowerbound @ Upperbound 
@lLowerbound @ Upperbound 


neg1 neg2 neg3 pos1 pos2 pos3 
neg1 neg2 neg3 pos1 pos2 pos3 


Fig.6: A correct program found. No negative 
output can ever be greater than any positive out- 
put. 5.5 can be used as a threshold 


Fig.7: A bad program. pos 1 can never be 
greater than neg 3. 
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4.3 Merge Search 


In the rest of this subsection, we describe three heuristics for scaling up synthesis 
to large data sets, namely divide and conquer, simulated annealing, and parallel 
processing. We call the combination of these the merge search technique. 
Divide and Conquer. Enumerating and verifying programs on large data sets 
is expensive. Our core strategy to improve performance is to learn patterns on 
small subsets and merge them into a global pattern with low overhead. 

It is based on two observations: First, the pattern of the entire data set is 
usually shaped by a few extreme data points. Looking at these extreme data 
points locally is enough to figure out critical properties of the global pattern. 
Second, conflicts in local patterns are mostly describing different aspects of a 
same target rather than fundamental differences, thus can be resolved by simple 
merge operations such as disjunction, truncation or concatenation. 

This divide and conquer strategy is captured in the following algorithm: 


def d&c(dataset) 
if dataset.size > threshold 
subsetL,subsetR = split(dataset) 
candidateL = d&c(subsetL) 
candidateR = d&c(subsetR) 
return merge(dataset, candidateL, candidateR) 
else 
return synthesize(dataset, sQ) 


The “split” step corresponds to evenly splitting positive and negative ex- 
amples. Then sub-patterns are synthesized on smaller subsets. The conquer, or 
“merge” step requires synthesizing the pattern again on the combined dataset. 
But sub-patterns are reused in two ways to speedup this search. 

First, if we see a sub-pattern as an AST, then its low-level sub-trees up to 
certain depth threshold are added to the syntax as a new production option for 
the corresponding non-terminal at the sub-tree’s root. They can then serve as 
shortcuts for likely building blocks. Second, the sub-patterns’ skeletons left after 
removing these sub-trees are used as seeds for higher-level searches, which serve 
as shortcuts for likely overall structures. Both are given complexity rewards to 
encourage the reuse. 

In practice, many search results can be directly reused from cached results 

generated from previous tasks on similar subsets. This optimization can further 
reduce the synthesis time. 
Simulated Annealing When searching for local patterns at lower levels, we 
require the Enumerator to find not 1 but t candidate patterns for each subset. 
Such searches are fast for smaller data sets and can cover a wider range of possible 
patterns. As the search goes to higher levels for larger data sets, we discard the 
least accurate local patterns and also reduce t. The search will focus on refining 
the currently optimal global pattern. This idea is based on traditional simulated 
annealing algorithms and helps to improve the synthesizer’s performance in many 
cases. 
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Parallelization. Most steps in the synthesis process are inherently paralleliz- 
able. They include (1) doing synthesis on different subsets of data, (2) exploring 
different programs in the enumeration, (3) verifying different programs found so 
far, (4) executing a program on different data points during the verification. 

We focus less on optimizing (1) and (2) since they are not the performance 
bottlenecks. We instead focus on parallelizing (3) and (4) over multiple cores. In 
our implementation, using 5 machines with 32 cores each, we devote one thread 
each to run task (1) and (2) on one machine, 64 threads on the same machine to 
run task (3), and 512 threads distributed over the remaining four machines to 
run task (4). The distributed version is approximately two orders of magnitude 
faster than the single-threaded version for complex tasks. Given more computing 
power, a proportional speedup can be expected. 


5 Evaluation 


We implemented Sharingan in 10K lines of C++ code. Our experiments are 
carried out in a cluster of five machines directly connected by Ethernet cables, 
each with 32 Intel(R) Xeon(R) E5-2450 CPUs. The frequency for each core is 
2.10GHz. Arrangements of tasks are explained in the last part of Sec 4.3. We 
will evaluate the minimal feature engineering(5.1), accuracy(5.2), interpretabil- 
ity and editability(5.3), efficient implementation(5.4), and synthesis algorithm 
efficiency(5.5) aspects of Sharingan in order. 


5.1 Data Preparation 


We utilize eight types of attacks from the CICIDS2017 database[25,7], a public 
repository of benign and attack traffic used for evaluating intrusion detection 
systems. They cover a wide range of attack traffic including botnets, Denial of 
service (DoS), port scanning, and password cracking. 

The data is labelled per flow by an attack type or “Benign”. We learn each 
type of attack against benign traffic separately. To use as much data as possible, 
for each attack type, we use 1500 positive (attack) flows and 10000 negative (be- 
nign) flows for training, and another distinct data set of similar size for testing. 

The main benefit of Sharingan in this step is the minimal need for feature 
engineering. We simply use all header fields of TCP and IP, and the inter-packet 
arrival time between adjacent packets in the same flow as features. In total, there 
are 19 features per packet and N x 19 features per trace of length N. 

In contrast, other state-of-the-art systems rely on a carefully designed fea- 
ture extraction step to work well. For example, the feature vectors included in 
CICIDS2017 database contain 84 features extracted by the CICFlowMeter [9,13] 
tool for each flow, characterizing performance metrics of the entire flow such as 
duration, mean forward packet length, min activation time, etc. Kitsune [16] 
extracts bandwidth information over the past short periods as packet-level fea- 
tures. DECANTEeR [6] uses HT'TP-level properties such as constant header fields, 
language, amount of outgoing information, etc. as flow-level features. 
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5.2 Learning Accuracy 


We next validate Sharingan’s learning accuracy using the following evaluation 
methodology. For each individual attack type, we use the training data (attack 
and normal traffic) as input to Sharingan to learn a NetQRE program. The 
NetQRE program is then validated on the corresponding testing set for accu- 
racy. The output of Sharingan includes a NetQRE program that maps a network 
trace to an integer output and a recommended range for the threshold. By mod- 
ifying the threshold, true positive rate (TP) and false positive rate (FP) can 
be adjusted, as we will later explain in Section 5.3. We use AUC (Area under 
Curve) - ROC (Receiver Operating Characteristics) metric, which is a standard 
statistical measure of classification performance. 


Do Hulk SSH Patator 
True 0.01 mTrue Positive 


Positive -0.001 m True Positive 0.03 mAUC-ROC g Rate 


DDoS s 


Botnet ARES Port Scan 


Slowhttps 


Fig. 8: Sharingan’s true positive rate under low false positive rate, AUC-ROC and learning rate for 
8 attacks in CICIDS2017 (higher is better) 


Figure 8 contains results for eight types of attacks. Apart from AUC-ROC 
values, we also show the true positive rates when false positive rate is adjusted to 
3 different levels: 0.001, 0.01, and 0.03. Given that noise is common in most net- 
work traffic, the last metric shown in Figure 8 is the highest achievable learning 
rate. 

Overall, we observe that Sharingan performs well across a range of attacks 
with accuracy numbers on par with prior state-of-the-art systems such as Kit- 
sune, which has an average AUC-ROC value of 0.924 on nine types of loT-based 
attacks, and DECANTeR, which has an average detection rate of 97.7% and a 
false positive rate of 0.9% on HTTP-based malware. In six out of eight attacks, 
Sharingan achieves above 0.994 of AUC-ROC and 100% of true positive rate at 
1% false positive rate. The major exception is Botnet ARES, which consists of a 
mix of malicious attack vectors. Handling such multi-vector attacks is an avenue 
for our future work. 


5.3 Post-processing and Interpretation 


One of the benefits of Sharingan is that it generates an actual classification 
program that can be further adapted and tuned by a network operator. The 
program itself is also close to the stateful nature of session-layer protocols and 
attacks, and thus is readable and provides a basis for the operator to understand 
the attack cause. We briefly illustrate these capabilities in this section. 

FP-TP Tradeoff Network operators need to occasionally tune a classifier’s 
sensitivity to false positives and true positives. Sharingan generates a NetQRE 
program with a threshold T. This threshold can be adjusted to vary the false 
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positive and true positive rate. Figures 9 and 10 show the output distribution 
from positive and negative examples in the DoS Hulk attack. A denotes the 
largest negative output and B denotes the smallest positive output. When A > 
B, there is some unavoidable error. We can slide the threshold T from B to A 
and obtain an ROC curve for the test data, as illustrated in Figure 11. 
Interpretation We describe a learnt NetQRE program to demonstrate how a 
network operator can interpret the classifiers. ? The NetQRE program synthe- 
sized by Sharingan for DDoS task above is: 

CC BR Hy oti fo’ © 4/7 etn & 4 

Where 

A = [ip.src_ip->[0%,50%]] Bes LtepeEst—— M 

C = [time_since_last_pkt <=50%] 


DDoS is a flood attack from a botnet of machines to exhaust memory re- 
sources on the victim server. The detected pattern consists of packets that start 
with source IP in a certain range, followed by a packet with the reset bit set to 
1, and then a packet with a short time interval from its predecessor. Finally, the 
program considers the flow a match if the patterns show up with a total count 
of over 4. 

The range of source IP addresses specified in the pattern possibly contains 
botnet IP addresses. Attack flows are often reset when the load cannot be han- 
dled or the flows’ states cannot be recognized, which indicates the attack is suc- 
cessfully launched. Packets with short intervals further support a flood attack. 
Unique properties of DDoS attack are indeed captured by this program! 
Refinement by Human Knowledge Finally, an advantage of generating a 
program for classification is that it enables the operator to augment the gener- 
ated NetQRE program with domain knowledge before deployment. For example, 
in the DDoS case, if they know that the victim service is purely based on TCP, 
they can append [ip.type = TCP] to all predicates. Alternatively, if they know 
that the victim service is designed for 1000 requests per second, they can ex- 
plicitly replace the arrival time interval with lms. The modified program then 
is: 

CC /_* A _* B _*/ )*sum /_* C _*/ )sum > 4 
Where 


3 A full list of learnt NetQRE programs can be found in our tech report https: 
//arxiv.org/abs/2010.06135. 
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A = [ip.type = TCP]&&[ip.src_ip->[0%,50%]] 
B = [ip.type = TCP]&&[tcp.rst==1] 
C = [ip.type = TCP]&&[time_since_last_pkt <=1ms] 


5.4 Deployment Scenarios 


We now describe three ways for network operators to deploy the output of 
Sharingan: (1) taking action hinted by the interpretation; (2) directly executing 
the NetQRE program as a monitoring system; and (3) translating the NetQRE 
program to rules in other monitoring systems. 

Revisiting the DDoS example in Section 5.3, in the first case, the operator 
may refine the source IP part to find out the accurate range of attacker machines 
and block them. 

If the NetQRE program itself is to be used as a monitoring system, its runtime 
system can be directly deployed on any general purpose machine. Prior work [37] 
has shown that NetQRE generates performance that is comparable to optimized 
low-level implementations. Moreover, these programs can be easily compiled into 
other formats acceptable to existing monitoring systems. 


5.5 Program Synthesis Performance 


Synthesis time: In our final experiment, the performance of Sharingan is mea- 
sured, in terms of time needed for program synthesis. 

Figure 12 shows the program complexity (Y-axis) and synthesis (learning) 
time (in minutes). Not surprisingly, complex programs require more time to 
synthesize. We further observe that Sharingan is able to synthesize complex 
programs with at least 20-30 terms, mostly within minutes to an hour, which 
is practical for many real-world use cases and can be further reduced through 
parallelism over more machines. As a comparison, Kitsune reports training times 
between 8 minutes and 52 minutes on individual attacks [16], and DECANTeR 
reports training times between 5 hours and 10 hours on individual users’ data 
[6]. 
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Fig. 13: Impact of optimizations on synthesis 
Fig. 12: Time-complexity relation performance 


Effectiveness of Optimizations. We explore the effectiveness of the indi- 
vidual optimization strategies described in Section 4. In Figure 13, we compare 
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the synthesis time and the number of programs searched for a fully optimized 
Sharingan against results from disabling each optimization. SSH Patator is used 
as the demonstrating example since it is moderately complex. 

We observe that disabling partial execution optimization makes both metrics 
significantly worse. Being able to prune early can indeed greatly reduce time 
wasted on unnecessary exploration and checking. By disabling merge search, 
although the number of programs searched decreases, the total synthesis time 
increases given the overhead of having to check each program against the entire 
data set. The synthesis cannot finish within reasonable time if both are disabled. 

In summary, all optimization strategies are effective to speed up the synthesis 
process. A synthesis task that is otherwise impossible to finish within practical 
time can now be done in less than 15 minutes. 


6 Related Work 


Auto-Generation of Network Configurations. Broadly speaking, network 
traffic classification rule is a type of network configuration. There are other lines 
of research that aim at the automatic generation of different categories of network 
configurations. EasyACL [15] aims at synthesis of access control lists(ACL) from 
natural language descriptions. NetGen [24], NetComplete [10] and Genesis [32] 
synthesize data plane routing configurations based on SMT solvers given policy 
specifications. NetEgg [36] instead takes examples provided by user to generate 
routing configurations in an interactive way. Sharingan focuses on network traffic 
classification and has a different target from them. 

Other Learning-based Systems. Apart from competing systems we explic- 
itly compared to above, there are other learning-based systems under different 
settings from Sharingan. 

Unsupervised learning systems are useful for recognizing outliers and other 
types of “abnormal” flows [17,38,35], most notably in intrusion detection sys- 
tems. Its ability to differentiate unknown types of traffic from the known cannot 
be replaced by Sharingan. Sharingan can augment unsupervised learning systems 
by reducing the effort required for analyzing the reported traces. 

Learning systems using state machine[18] or regular expressions for payload 
strings[34] as models both share the advantage of requiring minimal feature 
engineering. The former generates less succinct models compared to Sharingan 
and is typically used for verification of network protocols. The latter learns 
patterns at individual packet level rather than session level. 

There are state-of-the-art point solutions focusing on specific scenarios rather 

than general-purpose network traffic classification. For example, PrivateEye fo- 
cuses on detecting privacy breaches in the cloud/4]. RFDIDS solves intrusion 
detection challenges unique to power systems|[26]. 
Syntax-Guided Synthesis. Sharingan builds on a large body of work on 
syntax-guided synthesis [11,21,23,20,22,29,27]. However, synthesis techniques pro- 
posed in this paper go beyond the state of the art, and have the potential to be 
applied to other applications of program synthesis. 
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Partial execution share similarity to the overestimation idea in [14] (see also 
follow-ups [29,30,33]), where the system learns plain regular expressions and 
overestimates the feasibility of a non-terminal with a Kleene-star. But no prior 
work proposed an overestimation algorithm for quantitative stream query lan- 
guages similar to NetQRE. Nor do they consider the specification format for a 
classifier program with unknown numerical thresholds. 

[3] proposed a divide-and-conquer strategy similar to merge search for opti- 
mizing program synthesis. It is focused on standard SyGuS tasks based on logical 
constraints and uses decision tree to combine sub-patterns instead of trying to 
merge them into one compact program. Merge search proposed in this work is 
not specific to Sharingan, and can be used in other synthesis tasks to allow the 
handling of large data sets. 

Finally, there is no prior work that solely uses program synthesis to perform 
accurate real-world large-scale classification. The closest work concerns simple 
low-accuracy programs synthesized as weak learners [8], and requires a separate 
SVM to assemble them into a classifier. 


7 Conclusion 


This paper presents Sharingan, which develops syntax-guided synthesis tech- 
niques to automatically generate NetQRE programs for classifying session-layer 
network traffic. Sharingan can be used for generating network monitoring queries 
or signatures for intrusion detection systems from labeled traces. Our results 
demonstrate three key value propositions for Sharingan, namely minimal fea- 
ture engineering, efficient implementation, and interpretability as well as ed- 
itability. While achieving accuracy comparable to state-of-the-art statistical and 
signature-based learning systems, Sharingan is significantly more usable and re- 
quires synthesis time practical for real-world tasks. + 
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Abstract. The model of asynchronous programming arises in many con- 
texts, from low-level systems software to high-level web programming. 
We take a language-theoretic perspective and show general decidability 
and undecidability results for asynchronous programs that capture all 
known results as well as show decidability of new and important classes. 
As a main consequence, we show decidability of safety, termination and 
boundedness verification for higher-order asynchronous programs—such 
as OCaml programs using Lwt—and undecidability of liveness verifica- 
tion already for order-2 asynchronous programs. We show that under 
mild assumptions, surprisingly, safety and termination verification of 
asynchronous programs with handlers from a language class are decidable 
iff emptiness is decidable for the underlying language class. Moreover, 
we show that configuration reachability and liveness (fair termination) 
verification are equivalent, and decidability of these problems implies de- 
cidability of the well-known “equal-letters” problem on languages. Our 
results close the decidability frontier for asynchronous programs. 


Keywords: Higher-order asynchronous programs - Decidability 


1 Introduction 


Asynchronous programming is a common way to manage concurrent requests in 
a system. In this style of programming, rather than waiting for a time-consuming 
operation to complete, the programmer can make asynchronous procedure calls 
which are stored in a task buffer pending later execution. Each asynchronous 
procedure, or handler, is a sequential program. When run, it can change the 
global shared state of the program, make internal synchronous procedure calls, 
and post further instances of handlers to the task buffer. A scheduler repeatedly 
and non-deterministically picks pending handler instances from the task buffer 
and executes their code atomically to completion. Asynchronous programs ap- 
pear in many domains, such as operating system kernel code, web programming, 
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or user applications on mobile platforms. This style of programming is supported 
natively or through libraries for most programming environments. The interleav- 
ing of different handlers hides latencies of long-running operations: the program 
can process a different handler while waiting for an external operation to finish. 
However, asynchronous scheduling of tasks introduces non-determinism in the 
system, making it difficult to reason about correctness. 


An asynchronous program is finite-data if all program variables range over 
finite domains. Finite-data programs are still infinite state transition systems: 
the task buffer can contain an unbounded number of pending instances and the 
sequential machine implementing an individual handler can have unboundedly 
large state (e.g., if the handler is given as a recursive program, the stack can 
grow unboundedly). Nevertheless, verification problems for finite-data programs 
have been shown to be decidable for several kinds of handlers [12,30,20,6]. Sev- 
eral algorithmic approaches have been studied, which tailor to (i) the kinds of 
permitted handler programs and (ii) the properties that are checked. 


State of the art We briefly survey the existing approaches and what is known 
about the decidability frontier. The Parikh approach applies to (first-order) re- 
cursive handler programs. Here, the decision problems for asynchronous pro- 
grams are reduced to decision problems over Petri nets [12]. The key insight is 
that since handlers are executed atomically, the order in which a handler posts 
tasks to the buffer is irrelevant. Therefore, instead of considering the sequential 
order of posted tasks along an execution, one can equivalently consider its Parikh 
image. Thus, when handlers are given pushdown systems, the behaviors of an 
asynchronous program can be represented by a (polynomial sized) Petri net. 
Using the Parikh approach, safety (formulated as reachability of a global state), 
termination (whether all executions terminate), and boundedness (whether there 
is an a priori upper bound on the task buffer) are all decidable for asynchronous 
programs with recursive handlers, by reduction to corresponding problems on 
Petri nets [30,12]. Configuration reachability (reachability of a specific global 
state and task buffer configuration), fair termination (termination under a fair 
scheduler), and fair non-starvation (every pending handler instance is eventually 
executed) are also decidable, by separate ad hoc reductions to Petri net reach- 
ability [12]. A “reverse reduction” shows that Petri nets can be simulated by 
polynomial-sized asynchronous programs (already with finite-data handlers). 


In the downclosure approach, one replaces each handler with a finite-data 
program that is equivalent up to “losing” handlers in the task buffer. Of course, 
this requires that one can compute equivalent finite-data programs for given 
handler programs. This has been applied to checking safety for recursive han- 
dler programs [3]. Finally, a bespoke rank-based approach has been applied to 
checking safety when handlers can perform restricted higher-order recursion [6]. 


Contribution Instead of studying individual kinds of handler programs, we 
consider asynchronous programs in a general language-theoretic framework. The 
class of handler programs is given as a language class C: An asynchronous pro- 
gram over a language class C is one where each handler defines a language from 
C over the alphabet of handler names, as well as a transformer over the global 
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state. This view leads to general results: we can obtain simple characterizations 
of which classes of handler programs permit decidability. For example, we do 
not need the technical assumptions of computability of equivalent finite-data 
programs from the Parikh and the downclosure approach. 

Our first result shows that, under a mild language-theoretic assumption, 
safety and termination are decidable if and only if the underlying language class 
C has decidable emptiness problem.! Similarly, we show that boundedness is 
decidable iff finiteness is decidable for the language class C. These results are 
the best possible: decidability of emptiness (resp., finiteness) is a requirement 
for safety and termination verification already for verifying the safety or termi- 
nation (resp., boundedness) of one sequential handler call. As corollaries, we get 
new decidability results for all these problems for asynchronous programs over 
higher-order recursion schemes, which form the language-theoretic basis for pro- 
gramming in higher-order functional languages such as OCaml [21,28], as well 
as other language classes (lossy channel languages, Petri net languages, etc.). 

Second, we show that configuration reachability, fair termination, and fair 
starvation are mutually reducible; thus, decidability of any one of them implies 
decidability of all of them. We also show decidability of these problems implies 
the decidability of a well-known combinatorial problem on languages: given a 
language over the alphabet {a,b}, decide if it contains a word with an equal 
number of as and bs. Viewed contrapositively, we conclude that all these deci- 
sion problems are undecidable already for asynchronous programs over order-2 
pushdown languages, since the equal-letters problem is undecidable for this class. 

Together, our results “close” the decidability frontier for asynchronous pro- 
grams, by demonstrating reducibilities between decision problems heretofore 
studied separately and connecting decision problems on asynchronous programs 
with decision problems on the underlying language classes of their handlers. 

While our algorithms do not assume that downclosures are effectively com- 
putable, we use downclosures to prove their correctness. We show that safety, 
termination, and boundedness problems are invariant under taking downclosures 
of runs; this corresponds to taking downclosures of the languages of handlers. 

The observation that safety, termination, and boundedness depend only on 
the downclosure suggests a possible route to implementation. If there is an effec- 
tive procedure to compute the downclosure for class C, then a direct verification 
algorithm would replace all handlers by their (regular) downclosures, and in- 
voke existing decision procedures for this case. Thus, we get a direct algorithm 
based on downclosure constructions for higher order recursion schemes, using 
the string of celebrated recent results on effectively computing the downclosure 
of word schemes [33,15,7]. 

We find our general decidability result for asynchronous programs to be sur- 
prising. Already for regular languages, the complexity of safety verification jumps 


1 The “mild language-theoretic assumption” is that the class of languages forms an 
effective full trio: it is closed under intersections with regular languages, homomor- 
phisms, and inverse homomorphisms. Many language classes studied in formal lan- 
guage theory and verification satisfy these conditions. 
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from NL (NFA emptiness) to EXPSPACE (Petri net coverability): asynchronous 
programs are far more expressive than individual handler languages. It is there- 
fore surprising that safety and termination verification remains decidable when- 
ever it is decidable for individual handler languages. 

Full proofs of our results are available here [25]. 


2 Preliminaries 


Basic Definitions We assume familiarity with basic definitions of automata the- 
ory (see, e.g., [18,31]). The projection of word w onto some alphabet X’, written 
Proj»: (w), is the word obtained by erasing from w each symbol which does not 
belong to X”. For a language L, define Projy,(L) = {Projs,(w) | w € L}. The 
subword order E on X* is defined as w E w for w,w’ € X* if w can be ob- 
tained from w’ by deleting some letters from w’. For example, abba E bababa 
but abba Z baaba. The downclosure w with respect to the subword order of a 
word w € X* is defined as [w := {w’ € X* | w’ E w}. The downclosure |Z of 
a language L C X* is given by [LD := {w € X* | dw € L: w E w}. Recall that 
the downclosure |Z of any language L is a regular language [17]. 

A multiset m: X > N over X maps each symbol of X to a natural number. 
Let M[X] be the set of all multisets over X. We treat sets as a special case 
of multisets where each element is mapped onto 0 or 1. As an example, we 
write m = [a,a,c] for the multiset m € M[{a,b,c,d}] such that m(a) = 2, 
m(b) = m(d) = 0, and m(c) = 1. We also write |m| = X pc s m(o). 

Given two multisets m,m’ € M|[X] we define the multiset m $ m’ € M[+] 
for which, for all a € X, we have (m © m’)(a) = m(a) + m’(a). We also define 
the natural order < on M[X] as follows: m < m’ iff there exists m4 € M[] 
such that m@m4 = m’. We also define m’ © m for m < m’ analogously: for all 
a € X, we have (m © m’)(a) = m(a) — m' (a). For X C X’ we regard m € M[X] 
as a multiset of M[”] where undefined values are sent to 0. 


Language Classes and Full Trios A language class is a collection of languages, 
together with some finite representation. Examples are the regular (e.g. rep- 
resented by finite automata) or the context-free languages (e.g. represented by 
pushdown automata or PDA). A relatively weak and reasonable assumption on a 
language class is that it is a full trio, that is, it is closed under each of the follow- 
ing operations: taking intersection with a regular language, taking homomorphic 
images, and taking inverse homomorphic images. Equivalently, a language class 
is a full trio iff it is closed under rational transductions [5]. 

We assume that all full trios C considered in this paper are effective: Given 
a language L from C, a regular language R, and a homomorphism h, we can 
compute a representation of the languages LM R, h(L), and h~1(L) in C. 

Many classes of languages studied in formal language theory form effective 
full trios. Examples include the regular and the context-free languages [18], the 
indexed languages [2,10], the languages of higher-order pushdown automata [26], 
higher-order recursion schemes (HORS) [16,9], Petri nets [14,19], and lossy chan- 
nel systems (see Section 4.1). (While HORS are usually viewed as representing 
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a tree or collection of trees, one can also view them as representing a word 
language, as we explain in Section 5.) 

Informally, a language class defined by non-deterministic devices with a finite- 
state control that allows ¢-transitions and imposes no restriction between input 
letter and performed configuration changes (such as non-deterministic pushdown 
automata) is always a full trio: The three operations above can be realized by 
simple modifications of the finite-state control. The deterministic context-free 
languages are a class that is not a full trio. 


Asynchronous Programs: A Language-Theoretic View We use a language- 
theoretic model for asynchronous shared-memory programs. 


Definition 1. Let C be an (effective) full trio. An asynchronous program (AP) 
over C is a tuple P = (D, X, (Le)cee, do, mo), where D is a finite set of global 
states, X is an alphabet of handler names, (L.)cee is a family of languages from 
C, one for each c € € where € = D x X x D is the set of contexts, do E€ D is the 
initial state, and mo € M[] is a multiset of initial pending handler instances. 

A configuration (d,m) € D x M|[X] of X consists of a global state d and a 
multiset m of pending handler instances. For a configuration c, we write c.d and 
c.m for the global state and the multiset in the configuration respectively. The 
initial configuration co of $ is given by co.d = do and co.m = mo. The semantics 
of Ẹ is given as a labeled transition system over the set of configurations, with 
the transition relation SC (D x M[]) x (D x M[X]) given by 


(d,m@[o]) > (d,mOm’) iff Sw € Laosa: Parikh(w) = m’ 


We use —* for the reflexive transitive closure of the transition relation. A con- 
figuration c is said to be reachable in $ if (do, mo) >* c. 


Intuitively, the set X of handler names specifies a finite set of procedures 
that can be invoked asynchronously. The shared state takes values in D. When 
a handler is called asynchronously, it gets added to a bag of pending handler 
calls (the multiset m in a configuration). The language Luca: captures the effect 
of executing an instance of ø starting from the global state d, such that on 
termination, the global state is d’. Each word w € Lagoa captures a possible 
sequence of handlers posted during the execution. 

Suppose the current configuration is (d,m). A non-deterministic scheduler 
picks one of the outstanding handlers ø € m and executes it. Executing o 
corresponds to picking one of the languages Lacar and some word w E Laca’. 
Upon execution of ø, the new configuration has global state d’ and the new bag 
of pending calls is obtained by taking m, removing an instance of ø from it, 
and adding the Parikh image of w to it. This reflects the current set of pending 
handler calls—the old ones (minus an instance of øo) together with the new ones 
added by executing ø. Note that a handler is executed atomically; thus, we 
atomically update the global state and the effect of executing the handler. 

Let us see some examples of asynchronous programs. It is convenient to 
present these examples in a programming language syntax, and to allow each 
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1 global var turn = ref 0 and x = ref 0; 

2 let rec si () = if * then begin post a; si(); post b end 

3 let rec s2 () = if * then begin post a; s2(); post b end else post b 

4 let a () = if !turn = 0 then begin turn := 1; x := !x + 1 end else post a 
5 let b () = if !turn = 1 then begin turn := 0; x := !x - 1 end else post b 
6 

7 let s3 () = post s3; post s3 

8 

9 global var t = ref 0; 

10 let c () = if !t = 0 then t := 1 else post c 

11 let d () = if !t = 1 then t := 2 else post d 

12 let f () = if !t = 2 then t := 0 else post f 

13 

14 let cc x = post c; x 

15 let dd x = post d; x 

16 let ff x = post f; x 

17 let id x =x 

18 let hg y= cc (g (dd y)) 

19 let rec produce g x = if * then produce (h g) (ff x) else g x 

20 let s4 () = produce id () 


Fig. 1. Examples of asynchronous programs 


handler to have internal actions that perform local tests and updates to the 
global state. As we describe informally below, and formally in the full version, 
when C is a full trio, internal actions can be “compiled away” by taking an in- 
tersection with a regular language of internal actions and projecting the internal 
actions away. Thus, we use our simpler model throughout. 


Examples Figure 1 shows some simple examples of asynchronous programs in an 
OCaml-like syntax. Consider first the asynchronous program in lines 1-5. The 
alphabet of handlers is s1, s2, a, and b. The global states correspond to possible 
valuations to the global variables turn and x; assuming turn is a Boolean and 
x takes values in N, we have that D = {0,1} x {0,1,w}, where w abstracts 
all values other than {0,1}. Since s1 and s2 do not touch any variables, for 
d,d! € D, we have Lasia = {a"b” |n > 0}, Lasoa = {a"b"*! | n> 0}, and 
Lasi, = Lae = Î ìf d' £ d. 

For the languages corresponding to a and b, we use syntactic sugar in the 
form of internal actions; these are local tests and updates to the global state. For 
our example, we have, e.g., £(,0),a,(1,1) = {E}, L(,2),a,(1,2) = {a} for all values 
of x, and similarly for b. The meaning is that, starting from a global state (0, 0), 
executing the handler will lead to the global state (1,1) and no handlers will be 
posted, whereas starting from a global state in which turn is 1, executing the 
handler will keep the global state unchanged but post an instance of a. Note 
that all the languages are context-free. 

Consider an execution of the program from the initial configuration 
((0, 0), [s1]). The execution of s1 puts n as and n bs into the bag, for some 
n > 0. The global variable turn is used to ensure that the handlers a and b 
alternately update x. When turn is 0, the handler for a increments x and sets 
turn to 1, otherwise it re-posts itself for a future execution. Likewise, when turn 
is 1, the handler for b decrements x and sets turn back to 0, otherwise it re-posts 
itself for a future execution. As a result, the variable x never grows beyond 1. 
Thus, the program satisfies the safety property that no execution sets x to w. 
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It is possible that the execution goes on forever: for example, if s1 posts 
an a and a b, and thereafter only b is chosen by the scheduler. This is not an 
“interesting” infinite execution as it is not fair to the pending a. In the case 
of a fair scheduler, which eventually always picks an instance of every pending 
task, the program terminates: eventually all the as and bs are consumed when 
they are scheduled in alternation. However, if instead we started with [s2], the 
program will not terminate even under a fair scheduler: the last remaining b will 
not be paired and will keep executing and re-posting itself forever. 

Now consider the execution of s3. It has an infinite fair run, where the 
scheduler picks an instance of s3 at each step. However, the number of pend- 
ing instances grows without bound. We shall study the boundedness problem, 
which checks if the bag can become unbounded along some run. We also study 
a stronger notion of fair termination, called fair non-starvation, which asks that 
every instance of a posted handler is executed under any fair scheduler. The 
execution of s3 is indeed fair, but there can be a specific instance of s3 that is 
never picked: we say s3 can starve an instance. 

The program in lines 9-20 is higher-order (produce and h take functions as 
arguments). The language of s4 is the set {c”d”£” | n > 0}, that is, it posts an 
equal number of cs, ds, and fs. It is an indexed language; we shall see (Section 5) 
how this and other higher-order programs can be represented using higher-order 
recursion schemes (HORS). Note the OCaml types of produce : (o + o) > o > o 
and h : (o + o) + o > o are higher-order. 

The program is similar to the first: the handlers c, d, and f execute in “round 
robin” fashion using the global state t to find their turns. Again, we use internal 
actions to update the global state for readability. We ask the same decision 
questions as before: does the program ever reach a specific global state and 
does the program have an infinite (fair) run? We shall see later that safety and 
termination questions remain decidable, whereas fair termination does not. 


3 Decision Problems on Asynchronous Programs 
We now describe decision problems on runs of asynchronous programs. 


Runs, preruns, and downclosures A prerun of an AP $ = (D, X, (Le)cce, do, mo) 
is a finite or infinite sequence p = (eo, no), 01, (e1, n1), 02, ... of alternating el- 
ements of tuples (e;,n;) € D x M[X] and symbols o; € X. The set of preruns 
of P will be denoted Preruns(3%). Note that if two asynchronous programs 8 
and $’ have the same D and X, then Preruns(8) = Preruns(B’). The length, 
denoted |p|, of a finite prerun p is the number of configurations in p. The it 
configuration of a prerun p will be denoted p(i). 

We define an order < on preruns as follows: For preruns p = 
(e0, No), 71, (e1; n1), 72,... and p' = (e9, np), 04, (e1, n1), 74,.-., we define p < p’ 
if |p| = |p’| and e; = ef, 0; = of and n; < n; for each i > 0. The downclosure |R 
of a set R of preruns of $B is defined as |R = {p € Preruns(B8) | do’ € R. p < p'}. 

A run of an AP $ = (D,2X,(Le)cee,do,Mo) is a prerun p = 
(do, Mo), 01, (d1,m1),02,... starting with the initial configuration (dọ, mo), 
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where for each i > 0, we have (d;, m;i) SADEN (di+1, M;i+1). The set of runs of 
$ is denoted Runs(%$) and JRuns(%$) is its downclosure with respect to <. 

An infinite run co “> cy 2 ... is fair if for all i > 0, if o € c;.m then 
there is some j > i such that cj + cj41-. That is, whenever an instance of a 
handler is posted, some instance of the handler is executed later. Fairness does 
not preclude that a specific instance of a handler is never executed. An infinite 
fair run starves handler ø if there exists an index J > 0 such that for each j > J, 
we have (i) c;.m(c) > 1 and (ii) whenever c; S cj41, we have cj.m(c) > 2. In 
this case, even if the run is fair, a specific instance of o may never be executed. 

Now we give the definitions of the various decision problems. 


Definition 2 (Properties of finite runs). The Safety (Global state 
reachability) problem asks, given an asynchronous program 38 and a global 
state dp E€ D, is there a reachable configuration c such that c.d = dy? If so, df 
is said to be reachable (in $8) and unreachable otherwise. The Boundedness 
(of the task buffer) problem asks, given an asynchronous program $, is there 
an N € N such that for every reachable configuration c, we have |c.m| < N? 
If so, the asynchronous program $ is bounded; otherwise it is unbounded. The 
Configuration reachability problem asks, given an asynchronous program SB 
and a configuration c, is c reachable? 


Definition 3 (Properties of infinite runs). All the following problems take 
as input an asynchronous program $. The Termination problem asks if all runs 
of B are finite. The Fair Non-termination problem asks if P has some fair 
infinite run. The Fair Starvation problem asks if 8 has some fair run that 
starves some handler. 


Our main result in this section shows that many properties of an asyn- 
chronous program ‘8 only depend on the downclosure ĻRuns($) of the set 
Runs(8) of runs of the program $8. The proof is by induction on the length 
of runs. For any AP $ = (D,2,(L.)cec,do,mo), we define the AP JB = 
(D, X, (LLc)cee, do, mo), where |Z, is the downclosure of the language Le under 
the subword order. 


Proposition 1. Let $ = (D, X, (Le)cee, do, mo) be an asynchronous program. 
Then ĻRuns( 4%) = {Runs(B). In particular, the following holds. (1) For every 
dE D, $ can reach d if and only if (38 can reach d. (2) P is terminating if and 
only if (Bis terminating. (3) X is bounded if and only if [3B is bounded. 


Intuitively, safety, termination, and boundedness is preserved when the mul- 
tiset of pending handler instances is “lossy”: posted handlers can get lost. This 
corresponds to these handlers never being scheduled by the scheduler. However, 
if a run demonstrates reachability of a global state, or non-termination, or un- 
boundedness, in the lossy version, it corresponds also to a run in the original 
problem (and conversely). In contrast, simple examples show that configura- 
tion reachability, fair termination, and fair non-starvation properties are not 
preserved under downclosures. 
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4 General Decidability Results 


In this section, we characterize those full trios C for which particular problems 
for asynchronous programs over C are decidable. Our decision procedures will 
use the following theorem, summarizing the results from [12], as a subprocedure. 


Theorem 1 ([12]). Safety, boundedness, configuration reachability, termi- 
nation, fair non-termination, and fair non-starvation are decidable for asyn- 
chronous programs over regular languages. 


4.1 Safety and termination 
Our first main result concerns the problems of safety and termination. 
Theorem 2. Let C be a full trio. The following are equivalent: 


(i) Safety is decidable for asynchronous programs over C. 
(it) Termination is decidable for asynchronous programs over C. 
(iii) Emptiness is decidable for C. 


We begin with “(i)=(iii)”. Let K C X* be given. We construct $B = 
(D, X, (Le)ece, do, Mmo) such that mp = [o], D = {do,di}, Layoa, = K and 
L. = 9 for c Æ (do, 0, d1). We see that $ can reach dı iff K is non-empty. Next 
we show “(ii)=(iii)”. Consider the alphabet [ = (X U {e}) x {0,1} and the ho- 
momorphisms g: [* => X* and h: IT* — {o}*, where for x € XU {e}, we have 
g((x,i)) = x for i € {0,1}, h((#,1)) = c, and h((x,0)) = £. If R C I™ is the 
regular set of words in which exactly one position belongs to the subalphabet 
(XU {e}) x {1}, then the language K’ := h(g~'(K)/M R) belongs to C. Note 
that K’ is Ø or {o}, depending on whether K is empty or not. We construct 
P = (D; di, (Le)cee, do, Mo) with D = {do}, mo = lo], Ti ,o,do = K’ and all 
languages Le = @ for c 4 (do, o, do). Then ¥ is terminating iff K is empty. 

To prove “(iii)=(i)”, we design an algorithm deciding safety assuming decid- 
ability of emptiness. Given asynchronous program ‘8 and state d as input, the 
algorithm consists of two semi-decision procedures: one which searches for a run 
of $ reaching the state d, and the second which enumerates regular overapprox- 
imations $B’ of $ and checks the safety of P’ using Theorem 1. Each $’ consists 
of a regular language A, overapproximating Le for each context c of P. We use 
decidability of emptiness to check that Le N (X* \ Ae) = @ to ensure that P’ is 
indeed an overapproximation. 

The algorithm clearly gives a correct answer if it terminates. Hence, we only 
have to argue that it always does terminate. Of course, if d is reachable, the first 
semi-decision procedure will terminate. In the other case, termination is due to 
the regularity of downclosures: if d is not reachable in 8, then Proposition 1 
tells us that |$ cannot reach d either. But |% is an asynchronous program over 
regular languages; this means there exists a safe regular overapproximation and 
the second semi-decision procedure terminates. 

Like the algorithm for safety, the algorithm for termination consists of two 
semi-decision procedures. By standard well-quasi-ordering arguments, an infinite 
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run of an asynchronous program ‘8 is witnessed by a finite self-covering run. 
The first semi-decision procedure enumerates finite self-covering runs (trying to 
show non-termination). The second procedure enumerates regular asynchronous 
programs 8’ that overapproximate 8. As before, to check termination of B’, it 
applies the procedure from Theorem 1. Clearly, the algorithm’s answer is always 
correct. Moreover, it gives an answer for every input. If P does not terminate, it 
will find a self-covering sequence. If P does terminate, then Proposition 1 tells 
us that |B is a terminating finite-state overapproximation. This implies that the 
second procedure will terminate in that case. 

Let us point out a particular example. The class £ of languages of lossy chan- 
nel systems is defined like the class of languages of WSTS with upward-closed sets 
of accepting configurations as in [13], except that we only consider lossy channel 
systems [1] instead of arbitrary Well-Structured Transition Systems (WSTS). 
Then £ forms a full trio with decidable emptiness. Although downclosures of 
lossy channel languages are not effectively computable (an easy consequence of 
[27]), our algorithm employs Theorem 2 to decide safety and termination. 


4.2 Boundedness 
Theorem 3. Let C be a full trio. The following are equivalent: 


(i) Boundedness is decidable for asynchronous programs over C. 
(it) Finiteness is decidable for C. 


Clearly, the construction for “(i)=(iii)” of Theorem 2 also works for “(i)=-(ii)”: 
$ is unbounded iff K is infinite. 

For the converse, we first note that if finiteness is decidable for C then so is 
emptiness. Given L C X* from C, consider the homomorphism h: (X'U {A})* > 
X* with h(a) = a for every a € X and A(A) = e. Then h~!(L) belongs to C and 
h~1(L) is finite if and only if L is empty: in the inverse homomorphism, À can 
be arbitrarily inserted in any word. By Theorem 2, this implies that we can also 
decide safety. As a consequence of considering only full trios, it is easy to see that 
the problem of context reachability reduces to safety: a context é = (d, ô, d') € € 
is reachable in $ if there is a reachable configuration (d, m) in % with m(é) > 1. 

We now explain our algorithm for deciding boundedness of a given 
aysnchronous program $ = (D, X, (Le)cece, do, Mo). For every context c, we 
first check if Le is infinite (feasible by assumption). This paritions the set of con- 
texts of $ into sets J and F which are the contexts for which the corresponding 
language Le is infinite and finite respectively. If any context in I is reachable, 
then $ is unbounded. Otherwise, all the reachable contexts have a finite lan- 
guage. For every finite language Le for some c € F, we explicitly find all the 
members of Le. This is possible because any finite set A can be checked with Le 
for equality. Le C A can be checked by testing whether Le N (X* \ A) = Ø and 
Le N (X* \ A) effectively belongs to C. On the other hand, checking A C Le just 
means checking whether Le N {w} 4 Ø for each w € A, which can be done the 
same way. We can now construct asynchronous program ‘8’ which replaces all 
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languages for contexts in I by Ø and replaces those corresponding to F by the 
explicit description. Clearly P’ is bounded iff P is bounded (since no contexts 
from I are reachable) and the former can be decided by Theorem 1. 

We observe that boundedness is strictly harder than safety or termination: 
There are full trios for which emptiness is decidable, but finiteness is undecidable, 
such as the languages of reset vector addition systems [11] (see [32] for a definition 
of the language class) and languages of lossy channel systems. 


4.3 Configuration reachability and liveness properties 


Theorems 2 and 3 completely characterize for which full trios safety, termina- 
tion, and boundedness are decidable. We turn to configuration reachability, fair 
termination, and fair starvation. We suspect that it is unlikely that there is a 
simple characterization of those language classes for which the latter problems 
are decidable. However, we show that they are decidable for a limited range of 
infinite-state systems. To this end, we prove that decidability of any of these 
problems implies decidability of the others as well, and also implies the decid- 
ability of a simple combinatorial problem that is known to be undecidable for 
many expressive classes of languages. 

Let Z C {a,b}* be the language Z = {w € {a,b}* | |wļla = |w|p}. The Z- 
intersection problem for a language class C asks, given a language K C {a,b}* 
from C, whether K N Z Æ Ø. Informally, Z is the language of all words with an 
equal number of as and bs and the Z-intersection problem asks if there is a word 
in K with an equal number of as and bs. 


Theorem 4. Let C be a full trio. The following statements are equivalent: 


(i) Configuration reachability is decidable for asynchronous programs over C. 
(ii) Fair termination is decidable for asynchronous programs over C. 
(iii) Fair starvation is decidable for asynchronous programs over C. 


Moreover, if decidability holds, then Z-intersection is decidable for C. 


We prove Theorem 4 by providing reductions among the three problems 
and showing that Z-intersection reduces to configuration reachability. We use 
diagrams similar to automata to describe asynchronous programs. Here, circles 
represent global states of the program and we draw an edge aE in 
case we have Lao = L in our asynchronous program ‘8. Furthermore, we 
have Lao,’ = Ø whenever there is no edge that. specifies otherwise. To simplify 

. wE. ye 
notation, we draw an edge d —> d’ in an asynchronous program for a word 


w E L*,w=o,...0n with o1,...,0, € X, to symbolize a sequence of states 


@ ailte} © alie} ort On|L © 


which removes [01,..., 0n] from the task buffer and posts a multiset of handlers 
specified by L. 
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Proof of “(ii)= (i)” Given an asynchronous program $B = (D, X, (Le)cee, do, Mo) 
and a configuration (dy, my) € D x M[X], we construct asynchronous program 
$ as follows. Let z be a fresh letter and let my = [o1,...,0n]. We obtain °’ 
from ‘8 by adding a new state d'p and including the following edges: 


Q o 


Starting from (do, mo È [z]), the program 8’ has a fair infinite run iff (df, m+) 
is reachable in $. The ‘if’ direction is obvious. Conversely, z has to be executed 
in any fair run p of $8’ which implies that d’, is reached by P" in p. Since only z 
can be executed at dy in p, this means that the multiset is exactly my when dy 
is reached during p. Clearly this initial segment of p corresponds to a run of 8 
which reaches the target configuration. 

Proof of “(iii)=>(ii)” We construct P = (D, 2”, (L))cee,do,mp) given X = 
(D, X, (Le)cce, do, Mmo) over C as follows. Let ©” = X U {s}, where s is a fresh 
handler. Replace each edge 


D o|L D by D o|LULs © sle 


at every state d € D. Moreover, we set mj = Mo @ |s, s]. Then ‘P’ has an infinite 
fair run that starves some handler if and only if $ has an infinite fair run. From 
an infinite fair run p of $8, we obtain an infinite fair run of P’ which starves 
s, by producing s while simulating p and consuming it in the loop. Conversely, 
from an infinite fair run p’ of $8’ which starves some 7, we obtain an infinite fair 
run p of P by omitting all productions and consumptions of s and removing two 
extra instances of s from all configurations. 

Proof of “(i)=(ii)” From $ = (D, X, (Le)cece, do, Mmo) over C, for each sub- 
set I. C X and rtr € X, we construct an asynchronous program Br, = 
(D’, =”, (Le)cece, do, mg) over C such that a particular configuration is reach- 
able in Pr, if and only if $B has a fair infinite run pp,,, where I’ is the set of 
handlers that is executed infinitely often in pr, and pr, starves T. Since there 
are only finitely many choices for I’ and 7, decidability of configuration reach- 
ability implies decidability of fair starvation. The idea is that run pr, exists if 
and only if there exists a run 


(do, mo) “+--+ 2+ (dn, mn) = (e0; n0) © (e1,m1) 5 e (ex, me), (1) 
where Ut fn} = T, foreach 1 < i < k n; € MII], m, < ng, and for each i € 
{1,...,k} with y; = 7, we have n;—1(7) > 2. In such a run, we call (do, mo) 2> 

- 2". (da, Mn) its first phase and (eo, no) > --- 25 (ep, ng) its second phase. 

Let us explain how Pr, reflects the existence of a run as in Eq. (1). The 
set X’ of handlers of Pr, includes X, X and Ê, where £ = {a | o € X} and 
£ = {ô | o € X} are disjoint copies of X. This means, a multiset M[2"] contains 
multisets w’ = m@m@m with m € M|X], m € M[S], and m € M[J]. A run of 
$r, simulates the two phases of p. While simulating the first phase, Bp, keeps 
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two copies of the task buffer, m and m. The copying is easily accomplished by a 
homomorphism with ø > oo for each o € X. At some point, Bp, switches into 
simulating the second phase. There, m remains unchanged, so that it stores the 
value of m, in Eq. (1) and can be used in the end to make sure that mp < ng. 
Hence, in the second phase, Pr, works, like $, only with X. However, when- 
ever a handler ø € X is executed, it also produces a task ô. These handlers are 
used at the end to make sure that every y € I’ has been executed at least once 
in the second phase. Also, whenever 7 is executed, Bp, checks that at least two 
instances of T are present in the task buffer, thereby ensuring that 7 is starved. 
In the end, a distinguished final state allows Bp, to execute handlers in I’ 
and I’ simultaneously to make sure that m, < ng. In its final state, Pr, can 
execute handlers 4 € I’ and y € I’ (without creating new handlers). In the final 
configuration, there can be no ô with o € X\ I, and there has to be exactly one 
4 for each y € I’. This guarantees that (i) each handler in T is executed at least 
once during the second phase, (ii) every handler executed in the second phase is 
from I’, and (iii) m, contains only handlers from I (because handlers from X 
cannot be executed in the second phase). 
Decidability of Z-intersection To complete the proof of Theorem 4, we reduce 
Z-intersection to configuration reachability. Given K C {a,b}* from C, we con- 
struct the asynchronous program $8 = (D, X, (Le)cce, do, mo) over C where 
D = {do, 0,1}, X = {a,b,c}, by including the following edges: 


alte} 


clk 
@) 4) @ 
bl{e} 


The initial task buffer is mọ = [c]. Then clearly, the configuration (0, []]) is 
reachable in % if and only if KN Z #90. 

Theorem 4 is useful in the contrapositive to show undecidability. For example, 
one can show undecidability of Z-intersection for languages of lossy channel 
systems (see Section 4.1): One expresses reachability in a non-lossy FIFO system 
by making sure that the numbers of enqueue- and dequeue-operations match. 
Thus, for asynchronous programs over lossy channel systems, the problems of 
Theorem 4 are undecidable. We also use Theorem 4 in Section 5 to conclude 
undecidability for higher-order asynchronous programs, already at order 2. 


5 Higher-Order Asynchronous Programs 


We apply our general decidability results to asynchronous programs over (deter- 
ministic) higher-order recursion schemes (HORS). Kobayashi [21] has shown how 
higher-order functional programs can be modeled using HORS. In his setting, a 
program contains instructions that access certain resources. For Kobayashi, the 
path language of the HORS is the set of possible sequences of instructions. For 
us, the input program contains post instructions and we translate higher-order 
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programs with post instructions into a HORS whose path language is used as 
the language of handlers. 
We recall some definitions from [21]. The set of types is defined by the grammar 
A := o | A —> A. The order ord(A) of a type A is inductively defined as 
ord(o) = 0 and ord(A > B) := max(ord(A) + 1,ord(B)). The arity of a type 
is inductively defined by arity(o) = 0 and arity(A > B) = arity(B) +1. We 
assume a countably infinite set Var of typed variables x : A. For a set O of typed 
symbols, the set © of terms generated from @ is the least set which contains O 
such that whenever s : A — B and t: A belong to Ô, then also st: B belongs 
to Õ. By convention the type o > ... (o + (o > 0)) is written o > ... > o > o 
and the term ((tıt2)t3---)tn is written tıt2---tn. We write Z for a sequence 
(£1, £2,..-, 2n) of variables. 

A higher-order recursion scheme (HORS) is a tuple .Y = (X, N, R, S) where 
X is a set of typed terminal symbols of types of order 0 or 1, M is a set of 
typed non-terminal symbols (disjoint from terminal symbols), S : o is the start 
non-terminal symbol and R is a set of rewrite rules Fx ,x2:--2%, —> t where 
F : A, >- > Án > o is a non-terminal in M, x; : A; for all i are variables 
and t : o is a term generated from X U M U Var. The order of a HORS is the 
maximum order of a non-terminal symbol. We define a rewrite relation —> on 
terms over XUN as follows: Fa —> t|z/a] if Fz > t € R, and if t —> t then 
ts —» t's and st —> st’. The reflexive, transitive closure of —> is denoted —>*. A 
sentential form t of Y is a term over XUN such that S —>* t. 


If N is the maximum arity of a symbol in X, then a (possibly infinite) tree over 
X is a partial function tr from {0,1,...,N —1}* to X that fulfills the following 
conditions: € € dom(tr), dom(tr) is closed under prefixes, and if tr(w) = a and 
arity(a) = k then {j | wj € dom(tr)} = {0,1,...,k — 1}. 

A deterministic HORS is one where there is exactly one rule of the form 
Fzrız2::--£n > t for every non-terminal F. Following [21], we show how a de- 
terministic HORS can be used to represent a higher-order pushdown language 
arising from a higher-order functional program. 


Sentential forms can be seen as ranked trees over 3} UN U Var. A sequence IT 
over {0,1,...,n— 1} is a path of tr if every finite prefix of IT € dom(tr). The set 
of paths in a tree tr will be denoted Paths(tr). Note that we are only interested 
in finite paths in our context. Associated with any path IT = n1, n2,..., Np is the 
word wy = tr(ny)tr(nyng) +++ tr(nyng-++ nz). Let X1 := {a € X | arity(a) = 1}. 
The path language Lp() of a deterministic HORS -7 is defined as {Projs, (wz) | 
IT € Paths(7.y)}. The tree language L,(.%) associated with a HORS is the set of 
finite trees over X generated by Z. 

The deterministic HORS corresponding to the higher-order function s3 from 
Figure 1 is given by Y = (X,N,R, S), where 
X ={br : o —> o > o0,c,d,f : o => 0,e : o} 
N ={S:0,F:(0 > o) > o > o0, H : (o > o) —> o > o, 1 : 0 > o} 
R ={S > F I e,I x > x, F G x —> br(F (H G) (£2)) (G x), 
H G x —> c(G(ax))} 
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The path language £,(.%) = {c"d"£” | n > 0}. To see this, apply the reduction 
rules to get the value tree 7.y shown on the right: 


S— F I e br (F (HT) (fe)) (Ie) br 
—> br (F (HI) (fe)) e e br 
— br (br (F (H°I) (£°e)) (HI)(fe)) e c br 
— br (br (F (H°I) (£%e)) c(I(afe)) e r 
— br (br (F (H?T) (£2e)) cdfe) e E 
Spite es 4 
A HORS X is called a word scheme if it has exactly one nullary terminal 
symbol e and all other terminal symbols X are of arity one. The word language 
Ly (7%) C X™ defined by Z is Ly(.%) = {a1a2: -an | (a1(a2--+ (Gn(e))---)) € 


L,(.7%)}. We denote by H the class of languages Ly(.%) that occur as the word 
language of a higher-order recursion scheme .”. Note that path languages and 
languages of word schemes are both word languages over the set X of unary 
symbols considered as letters. They are connected by the following proposition.” 


Proposition 2. For every order-n HORS Z = (X,N,S,R) there exists an 
order-n word scheme S’ = (X',N', S', R!) such that LaS) = Ly (7%). 


A consequence of [21] and Prop. 2 is that the “post” language of higher-order 
functional programs can be modeled as the language of a word scheme. Hence, 
we define an asynchronous program over HORS as an asynchronous program over 
the language class H and we can use the following results on word schemes. 


Theorem 5. HORS and word schemes form effective full trios [7]. Emptiness 
[23] and finiteness [29] of order-n word schemes are (n — 1)-EXPTIME-complete. 


Now Theorems 2 and 3, together with Proposition 2 imply the decidability 
results in Corollary 1. The undecidability result is a consequence of Theorem 4 
and the undecidability of the Z-intersection problem for indexed languages or 
equivalently, order-2 pushdown automata as shown in [33]. Order-2 pushdown 
automata can be effectively turned into order-2 OI grammars [10], which in turn 
can be translated into order-2 word schemes [9]. See also [22, Theorem 4]. 


Corollary 1. For asynchronous programs over HORS: (1) Safety, termination, 
and boundedness are decidable. (2) Configuration reachability, fair termination, 
and fair starvation are undecidable already at order-2. 


A Direct Algorithm We say that downclosures are computable for a language 
class C if for a given description of a language L in C, one can compute an 
automaton for the regular language |Z. From Proposition 1 and Theorem 1, 


? The models of HORS (used in model checking higher order programs [21]) and word 
schemes (used in language-theoretic exploration of downclosures [15,7]) are some- 
what different. Thus, we show an explicit reduction between the two formalisms. 
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if one can compute downclosures for a language class, then one can avoid the 
enumerative approaches of Section 4 and get a “direct algorithm.” The algorithm 
replaces each handler by its downclosure and then invokes the decision procedure 
summarized in Theorem 1. The direct algorithm for asynchronous programs over 
HORS relies on the recent breakthrough results on computing downclosures. 


Theorem 6 ([33,15,7]). Downclosures are effectively computable for H. 


Unfortunately, current techniques for computing downclosures do not yet pro- 
vide a complexity upper bound as we describe below. In [33], it was shown that in 
a full trio, downclosures are computable if and only if the diagonal problem for C 
is decidable. The latter asks, given a language L C X*, whether for every k € N, 
there is a word w E€ L with |w|, > k for every ø € X. The diagonal problem was 
then shown to be decidable for higher-order pushdown automata [15] and then 
for word schemes [7]. The algorithm from [33] to compute downclosures using an 
oracle for the diagonal problem employs enumeration to compute a downclosure 
automaton, thus we have hidden the enumeration into the downclosure compu- 
tation. We conjecture that downclosures can be computed in elementary time 
for word schemes of fixed order. This would imply an elementary time procedure 
for asynchronous programs over HORS of fixed order. 

For handlers over context-free languages, given as PDAs, Ganty and Majum- 
dar [12] show an EXPSPACE upper bound for safety, termination, and bound- 
edness. Their algorithm constructs for each handler a polynomial-size Petri net 
with certain guarantees (forming so-called adequate family of Petri nets) that 
accepts a Parikh equivalent language. These Petri nets are then used to construct 
a larger Petri net, polynomial in the size of the asynchronous program and the 
adequate family of Petri nets, in which safety, termination, or boundedness can 
be phrased as a query decidable in EXPSPACE. 

A natural question is whether a downclosure-based algorithm matches the 
same complexity. We can replace the Parikh-equivalent Petri nets of [12] with 
Petri nets recognizing the downclosure of a language. It is an easy consequence of 
Proposition 1 that the resulting Petri nets can be used in place of the adequate 
families of Petri nets in the procedures for safety, termination, and boundedness 
of [12]. Unfortunately, a finite automaton for |Z may require exponentially many 
states in the PDA [4], so a naive approach gives a 2EXPSPACE algorithm. 

In the full version of this paper, we show that that for each context-free lan- 
guage L, one can construct in polynomial time a 1-bounded Petri net accepting 
{L. (Recall that a 1-bounded Petri net if every reachable marking has at most 
one token in each place.) When used in the construction of [12], this matches the 
EXPSPACE upper bound for safety, termination, and boundedness verification. 

As a byproduct, we get a simple direct construction of a finite automaton 
for |Z when L is given as a PDA. This is of independent interest because ear- 
lier constructions of |Z always start from a context-free grammar and produce 
(necessarily!) exponentially large NFAs [24,8,4]. The key observation is that the 
downclosure of the language of a PDA can be represented, after some simple 
modifications, as the language accepted by the PDA with a bounded stack. 


General Decidability Results for Asynchronous Programs 465 


References 


15. 


16. 


17. 


18. 


. Abdulla, P.A., Bouajjani, A., Jonsson, B.: On-the-fly analysis of systems with 


unbounded, lossy FIFO channels. In: Proceedings of the 10th International 
Conference on Computer Aided Verification (CAV 1998). pp. 305-318 (1998). 
https://doi.org/10.1007/BFb0028754 

Aho, A.V.: Indexed grammars - an extension of context-free grammars. J. ACM 
15(4), 647-671 (1968). https://doi.org/10.1145/321479.321488 

Atig, M.F., Bouajjani, A., Qadeer, S.: Context-bounded analysis for concurrent 
programs with dynamic creation of threads. In: Proceedings of TACAS 2009. pp. 
107-123 (2009) 

Bachmeier, G., Luttenberger, M., Schlund, M.: Finite automata for the sub- and 
superword closure of CFLs: Descriptional and computational complexity. In: Pro- 
ceedings of LATA 2015. pp. 473-485 (2015) 

Berstel, J.: Transductions and context-free languages. Teubner-Verlag (1979) 
Chadha, R., Viswanathan, M.: Decidability results for well-structured transition 
systems with auxiliary storage. In: CONCUR ’07: Proc. 18th Int. Conf. on Con- 
currency Theory. LNCS, vol. 4703, pp. 136-150. Springer (2007) 

Clemente, L., Parys, P., Salvati, S., Walukiewicz, I.: The diagonal prob- 
lem for higher-order recursion schemes is decidable. In: Proceedings of 
the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, 
LICS ’16, New York, NY, USA, July 5-8, 2016. pp. 96-105. ACM (2016). 
https: //doi.org/10.1145/2933575.2934527 

Courcelle, B.: On constructing obstruction sets of words. Bulletin of the EATCS 
44, 178-186 (1991) 

Damm, W.: The IO-and Ol-hierarchies. Theoretical Computer Science 20(2), 95- 
207 (1982) 


. Damm, W., Goerdt, A.: An automata-theoretical characterization of the OI- 


hierarchy. Information and Control 71(1), 1-32 (1986) 


. Dufourd, C., Finkel, A., Schnoebelen, P.: Reset nets between decidability and un- 


decidability. In: Proceedings of ICALP 1998. pp. 103-115 (1998) 


. Ganty, P., Majumdar, R.: Algorithmic verification of asynchronous programs. ACM 


Transactions on Programming Languages and Systems (TOPLAS) 34(1), 6 (2012) 


. Geeraerts, G., Raskin, J., Begin, L.V.: Well-structured languages. Acta Inf. 44(3- 


4), 249-288 (2007). https: //doi.org/10.1007/s00236-007-0050-3 


. Greibach, S.A.: Remarks on blind and partially blind one-way multi- 


counter machines. Theoretical Computer Science 7(3), 311 — 324 (1978). 
https: //doi.org/10.1016 /0304-3975(78)90020-8 

Hague, M., Kochems, J., Ong, C.L.: Unboundedness and downward closures of 
higher-order pushdown automata. In: POPL 2016: Principles of Programming Lan- 
guages. pp. 151-163. ACM (2016) 

Hague, M., Murawski, A.S., Ong, C.L., Serre, O.: Collapsible pushdown automata 
and recursion schemes. In: Proceedings of the Twenty-Third Annual IEEE Sym- 
posium on Logic in Computer Science, LICS 2008, 24-27 June 2008, Pittsburgh, 
PA, USA. pp. 452-461 (2008). https: //doi.org/10.1109/LICS.2008.34 

Haines, L.H.: On free monoids partially ordered by embedding. Journal of Combi- 
natorial Theory 6(1), 94-98 (1969) 

Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to automata theory, lan- 
guages, and computation, 3rd Edition. Pearson international edition, Addison- 
Wesley (2007) 


466 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


R. Majumdar et al. 


Jantzen, M.: On the hierarchy of Petri net languages. RAIRO - Theoretical In- 
formatics and Applications - Informatique Théorique et Applications 13(1), 19-30 
(1979), http://www.numdam.org/item?id=ITA_1979__13_1_19_0 

Jhala, R., Majumdar, R.: Interprocedural analysis of asynchronous programs. In: 
POPL’07: Proc. 34th ACM SIGACT-SIGPLAN Symp. on Principles of Program- 
ming Languages. pp. 339-350. ACM Press (2007) 

Kobayashi, N.: Types and higher-order recursion schemes for verification of higher- 
order programs. In: Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium 
on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, Jan- 
uary 21-23, 2009. pp. 416-428 (2009). https://doi.org/10.1145/1480881.1480933 
Kobayashi, N.: Inclusion between the frontier language of a non-deterministic re- 
cursive program scheme and the Dyck language is undecidable. Theoretical Com- 
puter Science 777, 409-416 (2019) 

Kobayashi, N., Ong, C.L.: Complexity of model checking recursion schemes for 
fragments of the modal mu-calculus. Logical Methods in Computer Science 7(4) 
(2011) 

van Leeuwen, J.: Effective constructions in well-partially-ordered free monoids. 
Discrete Mathematics 21(3), 237-252 (1978). https://doi.org/10.1016/0012- 
365X(78)90156-5 

Majumdar, R., Thinniyam, R.S., Zetzsche, G.: General decidability results for 
asynchronous shared-memory programs: Higher-order and beyond (2021), http: 
//arxiv.org/abs/2101.08611 

Maslov, A.: The hierarchy of indexed languages of an arbitrary level. Doklady 
Akademii Nauk 217(5), 1013-1016 (1974) 

Mayr, R.: Undecidable problems in unreliable computations. Theoretical Computer 
Science 297(1-3), 337-354 (2003) 

Ong, L.: Higher-order model checking: An overview. In: 30th Annual ACM/IEEE 
Symposium on Logic in Computer Science, LICS 2015, Kyoto, Japan, July 6-10, 
2015. pp. 1-15 (2015). https: //doi.org/10.1109/LICS.2015.9 

Parys, P.: The complexity of the diagonal problem for recursion schemes. In: 
Proceedings of FSTTCS 2017. Leibniz International Proceedings in Informatics 
(LIPIcs), vol. 93, pp. 45:1-45:14 (2018) 

Sen, K., Viswanathan, M.: Model checking multithreaded programs with asyn- 
chronous atomic methods. In: CAV ’06: Proc. 18th Int. Conf. on Computer Aided 
Verification. LNCS, vol. 4144, pp. 300-314. Springer (2006) 

Sipser, M.: Introduction to the theory of computation. PWS Publishing Company 
(1997) 

Thinniyam, R.S., Zetzsche, G.: Regular separability and intersection empti- 
ness are independent problems. In: Proceedings of FSTTCS 2019. Leibniz 
International Proceedings in Informatics (LIPIcs), vol. 150, pp. 51:1-51:15. 
Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2019). 
https://doi.org/10.4230/LIPIcs.FSTTCS.2019.51 

Zetzsche, G.: An approach to computing downward closures. In: ICALP 2015. 
vol. 9135, pp. 440-451. Springer (2015), the undecidability of Z intersection is 
shown in the full version: http://arxiv.org/abs/1503.01068 


General Decidability Results for Asynchronous Programs 467 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (https: //creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the chapter’s 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will need 
to obtain permission directly from the copyright holder. 


Abate, Alessandro I-370 
Abbasi, Rosa [I-242 
Adam, Zsófia [1-433 
Agrawal, Sakshi [I-458 
Ahmed, Daniele 1-370 
Ahrendt, Wolfgang II-242 
Alur, Rajeev 1-430 

Amir, Guy I-203 

André, Étienne I-311 
Andrianov, Pavel  II-423 
Andriushchenko, Roman I-191 
Apinis, Kalmer -438 
Arias, Jaime I-311 
Ashok, Pranav I-326 


Backenköhler, Michael I-210 
Baek, Seulkee I-59 

Bansal, Suguman I-20 
Barrett, Clark I-113, H-145, II-203 
Bendik, Jaroslav I-291 
Beneš, Nikola I-64 

Beyer, Dirk [I-401 

Biere, Armin I-133, -357 
Biewer, Sebastian M-365 
Bisping, Benjamin I-3 
Blondin, Michael -3 
Bonakdarpour, Borzoo I-94 
Bortolussi, Luca I-210 

Brim, Luboš M-64 

Bryant, Randal E. I-76 
Budde, Carlos E. I-373 


Carneiro, Mario I-59 

Černá, Ivana I-291 

Češka, Milan I-191 

Chalupa, Marek I-453 
Chatterjee, Krishnendu I-20 
Chattopadhyay, Agnishom I-330 
Chen, Ran IJI-262 

Christakis, Maria I-43 

Cohen, Aviad Il-87 


Author Index 


Darke, Priyanka  II-458 
Darulova, Eva I-43, -242 


Erhard, Julian -438 
Ernst, Gidon M-24 


Fedyukovich, Grigory [I-24 
Felgenhauer, Bertram  []-127 
Ferreira, Margarida 1-152 
Finkbeiner, Bernd — II-365 
Furuse, Jun I-262 


Ganesh, Vijay I-303 
Gieseking, Manuel II-381 
Giesl, Jürgen 1-250 

Gol, Ebru Aydin 1-291 
Gorostiaga, Felipe [1-349 
Griggio, Alberto I-113 
Großmann, Gerrit I-210 


Haas, Thomas IH-428 
Haase, Christoph H-3 
Hajdu, Ákos II-433 
Hark, Marcel I-250 
Hartmanns, Arnd [I-373 
Hausmann, Daniel I-38 


Hecking-Harbusch, Jesko II-381 
Hermanns, Holger II-365, II-389 
Heule, Marijn J. H. 1-59, I-76, I-223 


Hojjat, Hossein I-443 
Howar, Falk -448 

Hsu, Tzu-Han I-94 
Huang, Cheng-Chao 1-389 


Igarashi, Atsushi HM-262 
Irfan, Ahmed I-113 


Jackermeier, Mathias II-326 
Jašek, Tomáš  I[I-453 


470 Author Index 


Jeangoudoux, Clothilde I-43 
Junges, Sebastian I-173, I-191 


Katoen, Joost-Pieter I-173, I-191, I-230 
Katz, Guy I-203 

Kaufmann, Daniela I-357 

Kawata, Akira -262 

Khoroshilov, Alexey II-423 

Klauck, Michaela II-389 

Kohl, Maximilian A. II-365, [-389 
Křetínský, Jan Il-326 


Lam, Wing 1-270 

Lepiller, Julien I-105 

Li, Jianlin I-389 

Li, Renjue I-389 

Li, Yahui I-430 

Lochmann, Alexander  II-127 
Lohar, Debasmita I-43 

Loo, Boon Thau I-430 
Lynce, Inês I-152 


Majumdar, Rupak 1-449 
Mamouras, Konstantinos 1-330 
Mann, Makai I-113 
Marinov, Darko I-270 
Martins, Ruben I-152 
Meyer, Fabian 1-250 
Meyer, Roland II-428 
Middeldorp, Aart H-127 
Mitterwallner, Fabian  II-127 
Mues, Malte  II-448 

Mutilin, Vadim  JI-423 
Myreen, Magnus O. II-223 


Nadel, Alexander I-87 
Nejati, Saeed II-303 
Nestmann, Uwe I-3 
Niemetz, Aina I-145, II-303 
Nishida, Yuki II-262 
Novak, Jakub I-453 


Offtermatt, Philip I-3 
Osama, Muhammad I-133 


Padon, Oded I-113 
Pastva, Samuel I-64 
Peruffo, Andrea I-370 
Petrucci, Laure I-311 
Piskac, Ruzica II-105 


Platzer, André II-181 

Pol, Jaco van de I-311 
Ponce-de-Leon, Hernán [I-428 
Preiner, Mathias  II-145, II-303 


Quatmann, Tim [1-230 


Rechtatkova, Anna [1-453 
Reger, Giles I-164 
Reynolds, Andrew II-145 
Rümmer, Philipp II-443 
Ryvchin, Vadim I-87 


Saan, Simmo II-438 
Šafránek, David Il-64 
Saito, Hiromasa II-262 
Sallai, Gyula I-433 
Sanchez, César I-94, II-349 
Santolucito, Mark II-105 
Schäf, Martin II-105 
Schiffl, Jonas M-242 
Schmid, Stefan I-411 
Schnepf, Nicolas 1-411 
Schnitzer, Yannik JI-365 
Schoisswohl, Johannes I[-164 
Schröder, Lutz I-38 
Schwarz, Michael I-438 
Schwenger, Maximilian Il-365 
Scott, Joseph I-303 

Seidl, Helmut II-438 
Sencan, Ahmet I-291 
Shamakhi, Ali I-443 

Shi, Lei 1-430 

Sobel, Joshua I-43 
Sokova, Veronika [I-453 
Sotoudeh, Matthew  II-281 
Spel, Jip 1-173 

Srba, Jiří 1-411 

Strejček, Jan I-453 
Suenaga, Kohei -262 
Sun, Jun I-389 


Tan, Yong Kiam  II-181, II-223 
Terra-Neves, Miguel I-152 
Thakur, Aditya V. -281 
Thinniyam, Ramanathan S. 1-449 
Tinelli, Cesare IM-145 


Ulbrich, Mattias -242 


Vardi, Moshe Y. I-20 
Venkatesh, R. II-458 
Ventura, Miguel [-152 
Vogler, Ralf -438 
Vojdani, Vesal I-438 
Voronkov, Andrei -164 


Wang, Jingyi 1-389 

Wang, Zhifu 1-330 

Wei, Anjiang I-270 
Weinhuber, Christoph I-326 
Weininger, Maximilian I-326 
Weiss, Gail I-351 

Wijs, Anton I-133 


Author Index 


Wolf, Verena I-210 
Wu, Haoze I-203 


Xie, Tao I-270 
Xue, Bai I-389 


Yadav, Mayank = II-326 
Yang, Pengfei 1-389 
Yanich, Ann I-381 
Yellin, Daniel M. I-351 
Yi, Pu 1-270 


Zetzsche, Georg 1-449 
Zhang, Lijun [-389 


471 


